IT Department Agility

Page content

Our development team have adopted the AGILE project management methodology.  Along with new continuous testing tools they’re able to implement changes much more quickly now.  They’re now finding the next barrier to agility is us, the IT operations department (the ops in DevOps).  We can’t rollout software as fast as they release is. Ideally IT Operations would be able to deploy their changes within a single AGILE sprint cycle (the dev cycle).

Time to extend Agile working from Dev team to Ops team and get things joined up…

Below is my first take on Agile and Agile manifesto. This going to be really interesting. Operations teams are typically focused on stability and their achive this by preventing changes. Agile is gonna shake things up!:

  • Individuals and Interactions over Processes and Tools
  • Working Software over Comprehensive Documentation
  • Customer Collaboration over Contract Negotiation, and
  • Responding to Change over Following a Plan

This post describes the IT Operations position and explores how IT Operations could deploy changes more quickly.

Why Is IT less agile than development?

Project managers/Scrum masters were surprised when IT changed back to 4 week sprints. Here’s why:

IT changes have greater impact potential

Consider the following stack.  Changes higher up the stack are easier as they have less impact.  Changes lower down are harder as they can impact everything above.  Development changes might be like small boats that can change direction quickly and be micro-managed.  In comparison, IT Infrastructure is an oil tanker…  Development and IT Ops mainly work together in the middle (the bun fight zone).

TheStack

IT Priority is Support, not changes

Development and IT Operations have different priorities.  IT priorities are:

  1. Support
  2. Stabilise
  3. Enhance

With Support as the number 1 priority, adding new features is done in spare time (if any).  If there is a lot of support work there will be less project work. Some devs have worked out that emailing helpdesk (creating support tickets) is the best way of getting their work to the top of the IT work queue vs taking the IT projects route. Half arsed, rushed implementations may result in more support overhead for IT. IT are wary of deploying things quickly just to suit dev sprint cycles. “Continuous attention to technical excellence and good design enhances agility”.  Less half arsed implementations, means less support, means more time for projects. IT Operations are wary of AGILE. Their view of AGILE might be something like this: http://www.halfarsedagilemanifesto.org/ Lets look at the real Agile manifesto in the following sections.

Individuals and Interactions over Processes and Tools

IT and Development need to interact well.  The handover between teams is needs to be slick.  IT need visibility of upcoming development work that will require IT resources or hardware. Development and IT should sit together if possible. Big changes can be achieved at though individuals (driving though a vision) and interactions (eg cross team working groups.)

Working Software over Comprehensive Documentation

IT would much prefer working software, since they have to troubleshoot it.  But a sensible level of docs is required.  A Wiki is great for collaborative, searchable documentation.  IT and Dev definitions of working may differ.

Customer Collaboration over Contract Negotiation

IT are typically the customer for 3rd party services, and require contracts and SLAs to get providers to provide support.

Responding to Change over Following a Plan

With longer lead times IT need prior notice of requirements from dev (this demands a plan longer than a couple of sprints).  IT work spanning multiple sprints can become lost in the backlog (Epics help here).  Use simple project plans to provide bigger picture and track progress. A story map is an attractive visualisation of key stages in a project and a useful attachment to a project plan. IT can invest in more automation, orchestration, self service portals, maintain instantly deployable templates for every platform in order to appear more agile.  Or development can simply plan ahead and give IT prior warning before something becomes urgent and they’re blocked.  Sitting the teams together could give IT warning of future hardware requirements, IT could show dev how to deploy their own machines (devops).

Making IT Operations More Agile

The following is an attempt to identify barriers to agility and offer recommendations

Barriers to Agility - Context switching

Interruptions are expensive.  Don’t try and do project work and support at the same time. Allocate separate staff to projects and support (and rotate).

Barriers to Agility - Resources

Project work done is proportional to number of staff available to do project work.

Barriers to Agility - Specialisations

Unlike a pool of general JAVA developers, IT Operations have subject matter experts around technologies such as Windows, Linux, networking, storage, virtualisation, databases, deployment, infrastructure applications.  While the ideal might be for everyone to have a cross functional skillset (everyone has knowledge in everything) this is not always possible. Also a technology enthusiast may have no interest in stepping out of their comfort zone and becoming a generalist (and they can get a specialist job elsewhere to carry on working with their preferred technology.) This leads to delays waiting for the appropriate Subject Matter Expert’s (SME) time; a bottleneck. Recommendations:

  • Good docs give someone else a change to pick it up
  • Good handover presentation helps others understand how to support it
  • Don’t have single SMEs. Buddy up
  • Don’t have no SME (unless you can get consultants at short notice)
  • Maintain a skills matrix, tracking what skills you have in the IT team and highlighting areas where you have few resources.

Barriers to Agility - High availability

The requirement to be able to quickly recover a service from an outage can slow down deployment into this live environment if the new system is not designed for high availability.  Some applications’ designs support failover between sites/datacentres, some don’t.  Where they don’t IT must crowbar them into some HA configuration and work out how to keep the new service if there is an outage. Recommendations:

  • Select “enterprise ready” applications that support high availability. Request this from vendors
  • Use infrastructure to abstract hardware/datacentre from the application. For example application health aware load balancers and stretched clusters so the server doesn’t know or care which datacentre it’s in and neither does the desktop client.

Barriers to Agility - Software licencing controls

Software licence systems (and anti-piracy controls) sometimes hinder rapid deployment or failover.  For example, activation of licence keys is sometimes a manual step and may be tied to hardware serial number or network adapter MAC address. Recommendations:

  • Select applications that support automated deployment, including the licence keys
  • Negotiate an additional ‘Disaster Recovery’ licence key with vendor

Barriers to Agility - Hardware changes

Hardware changes typically involve long lead times:

  1. Research hardware options and make a selection
  2. Budget sign off for hardware
  3. Wait for hardware to be delivered to site
  4. Change control process: schedule hardware upgrade
  5. Schedule datacentre staff to deploy hardware
  6. Test after hardware upgrade

Recommendations:

  • Stick to a standard hardware purchasing list. Keep this up to date with part numbers.
  • Provide lots of notice if development releases will require hardware upgrades (e.g. version x requires more RAM in the servers)
  • Track spare hardware well (left over hardware from Project A can quickly be reused by Project B rather than ordering new hardware)
  • Do capacity planning and monitoring. Don’t exceed capacity.

Barriers to Agility - Server Deployments

Windows and Linux server installs are quick, but:

  1. Servers may need hardware changes (above) before OS install
  2. SAN storage may need configuring (dependency on SAN specialist and change control for fibre switches)
  3. Dev may have to wait for IT to get time to kick off OS installs around support and other project work
  4. Devs can become blocked waiting for IT.
  5. If servers need to be rebuilt by IT several times (test, revert, test again, revert) this can add time to a project

Recommendations:

  • Give development access to a server virtualisation platform and guidance on deploying lab servers themselves
  • Give development access to virtual server snapshotting for rapid rollback of server changes (Look at Microsoft App Controller)
  • Development give IT plenty of warning for server deployments

 

Barriers to Agility - Software Deployments

Software deployments may involve long lead times:

  1. Budget sign off for software
  2. Legal dept to approve end user license agreement
  3. Wait for software and licence keys from vendor
  4. Identify software prerequisites and package these for deployment
  5. Review how the software works and develop workarounds for bad software
    1. Does it expect full admin rights
    2. Where does it store user preferences/data
  6. Package the new software for deployment (alternative is manual software installs)
  7. Deploy to early adopters, follow up to see it works ok for them
  8. Change control process for deployment to live environment
  9. Deploy software
    1. Scheduled deployment via deployment system (eg SCCM)
    2. Install during reboot cycle via Group Policy

Recommendations:

  • Pick software that supports unattended installation out of the box (does not need repackaging)
  • Pick software that meets “designed for Windows” logo requirements
  • Consider allowing users to select when software is installed rather than wait for scheduled deployments / weekends
  • Consider presenting software to users via server based computing (Citrix/Remote App) rather than local installs. Install once on server and then present app to users.
  • Consider using application virtualisation (thinapp/App-V) rather than traditional local installs. Software deployed to central network location rather than local installs.

Barriers to Agility - Software Upgrades

Software upgrades may involve long lead times:

  1. Customisations may inhibit the upgrade process
    1. Branding, DR failover workarounds
  2. Vendor’s upgrade process may be unreliable
  3. User data / preferences may need migrating from old version to new version

Recommendations:

  • Pick software where vendor has a planned for future upgrades (eg a sensible patching mechanism)
  • Customisation must be documented and justified

Barriers to Agility - Capacity Limits

Exceeding capacity limits can lead to long lead times.  Adding a single additional system suddenly becomes a big expensive deal:

  1. Exceeding max RAM or CPU in a machine needs hardware upgrades
  2. Exceeding max RAM in an OS edition needs OS upgrade (and license)
  3. Exceeding max vRAM or vCPUs in a virtual host/cluster needs new host/cluster
  4. Exceeding max IPs in a subnet needs a new subnet
  5. Exceeding max subnets in a site range needs a new addressing scheme
  6. Exceeding max blades in a chassis requires a new chassis install
  7. Exceeding max chassis in a cabinet requires a new cabinet install
  8. Exceeding max cabinets in a datacenter requires a new datacenter
  9. Exceeding max GB in a SAN requires a new SAN upgrade

Recommendations:

  • Consider hardware requirements early on in a project and give IT lots of notice
  • Monitor capacity and plan for more before you run out

Barriers to Agility - Application local data

Apps that store data locally on drive C incur a support overhead:

  1. IT cannot quickly move the user to another machine
  2. IT cannot failover user to alternate site
  3. IT cannot use desktop pools/clustering
  4. User looses work if disk fails
  5. Roaming profiles / user environment virtualisation require developers to follow MS rules and store user preferences in %appdata%

Recommendations:

  • Users store data in My Documents and don’t have permission to write files to C:
  • Developers advised of MS rules (eg designed for Windows guidelines) and configure apps to store data in %appdata% instead of C:\myApp\MySettings.ini
  • Evaluate user data management solutions to sync local data to network (eg Windows Work Folders)

References

The following are worth reading for AGILE methodology in IT Operations teams: