Running Your Change Management Process

"When implementing Changes it’s not just a case of hitting a bit red button and shouting “fly my pretties, fly” to an imaginary army of flying monkeys"
“When implementing Changes it’s not just a case of hitting a bit red button and shouting “fly my pretties, fly” to an imaginary army of flying monkeys”

Following on from my previous post about surviving implementation, here’s some advise on running your Change Management process aka The Day Job.

In Change Management your key areas to focus on are:

  • Recording and processing the Change
  • Change assessment
  • Change Advisory Board (CAB)
  • Build and Test
  • Implement
  • Review

Raising the Change

So first up; recording the Change. Ensure your Change Management policy covers who can request a Change i.e. can anyone raise a Change? Just IT? What about the business? Each organisation will be different but one thing to keep in mind is making sure your policy gives clear guidelines on the difference between a Change and a Service Request i.e. with Request Fulfilment ends and where Change Management begins.

Create a Change form so that Changes can be raised in a standard way. It’s really important to have consistent information in your Change requests so when reviewing and approving Changes, you have all the facts needed to make the right decision for your customers. If you have a sparkly, all powerful toolset to do it for you then happy days. If not, and there are lots of organisations just getting started with Change Management, then it’s time to get creative. In the past, I’ve set up Change request forms using Word or Excel (tweet me if you would like to see some examples).

Things to consider for your form:

  • —  Title
  • —  Description
  • —  Reason
  • —  Service affected
  • —  Impacted Cis
  • —  Will the CMDB (if you have one)  need to be updated afterwards
  • —  Risk
  • —  Implementation windows
  • —  Implementation teams
  • —  Pre implementation testing
  • —  Implementation plan
  • —  Post implementation verification
  • —  Back out plan
  • —  Impact to other environments
  • —  Will the change be replicated to your DR environment?

You need to have a lifecycle approach for raising Changes for example; the process for a standard Change will be very different to an emergency, sorting something out in the middle of the night type Change. Ensure this ties back in to your policy with criteria and examples so there’s no confusion.

Look at how Changes are classified and prioritised. If every Change is urgent, high priority, which one do you implement first? Classification is also really important – make your Change owners accountable for risk and impact assessments. If your company or Change Management tool has a Risk calculator use it  as it enables  Change requestors to assign a tangible risk to the Change (it removes the “if in doubt click medium” behaviour type) if not, I have some templates that I can share.

Change Assessment

Next up is our old friend, the Change Assessment stage. This is the initial check that the Change is reasonable and makes sense, a sanity check if you will. Sounds obvious but unfortunately you can’t teach common sense. On one memorable occasion, I was reviewing a list of Changes on my pre-CAB FSC and one jumped out at me. The Change was to move a business critical server from a secure Data Centre environment to the Server technician’s desk. The Change justification? “My little legs are getting tired from walking to the server room all the time”. Needless to say words were had and the Change was removed from the schedule (I just wish I’d taken a screen shot).

Things to bear in mind when assessing a Change are benefits, both technical and to the business. There’s no point in having the newest most gadgetastic server is the world if your end users don’t see any benefit from it. Really think about the risk involved with the proposed activity for example:

  • Number of people affected
  • Financial
  • Regulatory
  • Reputational
  • Loss of productivity
  • Downtime
  • Seasonal considerations

Make sure you have clear assessment criteria for managing Changes. Some examples could include:

  • Pre implementation testing – how do we know this Change will go as planned?
  • Deployment plan – does it make sense, if other teams are involved are they aware and do we have contact details for them? Are there any dodgy areas where we might need check point calls or additional support to mitigate risk?
  • Post implementation verification – ok we’ve done the Change, how do we make sure everything is as it should be?
  • Back out plan – hope for the best but prepare for the worst. What happens if something goes wrong on the night? Do we fix on fail or roll back? Are the Change implementers empowered to make a decision or is escalation needed?
  • Impact to other environments – “who cares about other environments?” I hear you ask. Let Aunty Vawns tell you why it matters from personal experience. Once upon a time in a galaxy far, far away I worked for a large investment bank in the city. A code Change to one of the most business critical systems (the market data feed to our trade floors) took longer than expected so instead of updating both the production and DR environments, only the production environment was updated. The implementation team planned on updating the DR environment but got distracted with other operational priorities. Fast forward to 6 weeks later, a crisis hits the trading floor, the call is made to invoke DR but we couldn’t because our market data services were out of sync. Cue a hugely stressful 2 hours where the whole IT organisation and its mum desperately scrambled to find a fix and an estimated cost to the business of over $8 million. Lesson more definitely learned that day.

The CAB

So we’ve raised our Change and sanity checked it to make sure what we’re planning on implementing is sensible and won’t, you know, set anything on fire. The next step is the Change Advisory Board or CAB. When setting up your CAB, make sure you have a clear Terms of Reference statement which will give attendees a steer on how to prepare, good meeting behaviours and how to represent Changes effectively.

Not every Change has to go to CAB. In fact, I’d say you should use CAB for your big, complicated Changes that would have a major impact on the business. Monthly, BAU server patching? That’s a candidate for a standard Change. Keep CAB simple and uncluttered with an agenda that deals with Changes that are high risk, major impacting or have lots of complicated detail. Make sure the right people turn up and that all areas are represented (don’t forget Security if you have any ISO 27001 or NGN regulatory requirements).

An effective CAB agenda could look something like this:

  • Review of implemented Changes
  • Incidents (or if time is short Major Incidents) caused by Change
  • Lessons Learned
  • Forward schedule of Change
  • Candidates for templates / standard Changes
  • Improvements / CSI
  • Good news stories

Remember, CABs don’t have to be a 2 hour meeting where everyone is locked in a conference room. Look at Agile or Lean on how you can make efficiency savings, consider virtual Cabs where you can approve most things via your toolset and make full use of technology such as conference calls, WebEx’s or Skype.

The next stage of the process is Build & Test. The first thing to look at is developing standard build methods. Use automation where appropriate, it saves duplication work, reduces the probability of human area and the economies of scale can be huge. If automation is too expensive, ensure build methods are well documented and templated where possible.

Look at your testing environments and ensure they are fit for purpose. Every organisation will have different needs but when looking at this – do you have enough to cover your operational work eg a live environment for production work, a DR environment in case the worst happens and an environment dedicated to testing and training? If you are limited do you have a booking system in place so that testing / training / dev time can be allocated fairly?

What about Changes to non production environments? Are they covered by your Change process? Is there a monthly environment refresh just in case?

How do you test Changes before go live. Is it Bob from the Server team saying – “this’ll do” or so you have something more formal in place. Are the business willing to help support with Change testing? Getting the business to support testing can have huge benefits. One client I worked with has huge challenges with keeping the Marketing department happy with Changes that they had requested to the company website. The Marketing team didn’t provide any post Change validation and the amount of backed out Changes to the website or emergency Changes to fix errors such as typos were common place. By engaging with the Marketing team and asking for someone to be available for a set 20 minute period of the Change window (we called it the smoke testing phase) our effectiveness increased and so did our customer engagement.

When implementing Changes it’s not just a case of hitting a bit red button and shouting “fly my pretties, fly” to an imaginary army of flying monkeys (although that would be so cool). A professional approach and great communication are key to making this stage a success.

First up, your Change Schedule (aka your Forward Schedule of Change or FSC). Make sure it’s easily accessible to your stakeholders so stored somewhere central that’s easy to navigate. Make sure it’s dynamic and relevant – otherwise you risk your data being obsolete within minutes of hitting the print button. Make sure it’s fit for purpose, states what Changes are being carried out, when and by which Service. You don’t need expensive toolsets for this; you can do it in Excel.

Something that goes hand in hand with your FSC is your Projected Service Availability or PSA. Yes, I know ITIL 3 insists on calling it a PSO (Projected Service Outage) but lets not frighten the horses. Planned Service Availability implies we’ve got this. Everything is planned, tested and safe; you have nothing to worry about. In contrast PSO just says downtime. What do our customers not like? Downtime – no matter how well planned out. Again, Excel can be your friend here, keep it simple, a list of your Services – green for those that are up and running, amber for those due some maintenance time.

When looking at Change windows ensure they have been agreed in advance with the business and that they are codified as maintenance windows on any SLAs. Try and negotiate pre-approved Change slots where possible eg the third Thursday of every month between 22:00 and 01:00 we will be doing security patching. If you know in advance that a Change needs to take place outside the usual Change window – ask nicely! If you’re really good you might be able to get some SLAs relaxed so that as an organisation, you’re not penalised for carrying out break / fix work.

Change Reviews

Finally, we have the Review stage. We’ve done our Change and everything has (hopefully) gone to plan. So let’s carry out a review of our implemented Changes. When things go well, brilliant! You have new candidates for standard Changes, work instructions or templates. If things go badly or you really do end up setting something on fire then let’s look at how we can do better next time. Carry out a Change review to look at what happened, what went wrong, what was the root cause, how did we fix it and how do we stop it from happening again? Involve Incident & Problem Management here as they have super powers in these areas. Above all you want your lessons learned to be captured, discussed and acted on. This could be a regular agenda at CAB meetings or form part of a Service Improvement Plan or SIP.

When reviewing Changes look at the benefits. In terms of business benefits, did we achieve the expected results? Have we got any customer feedback? We also need to look at the technical benefits i.e. have we increased stability, added resilience or fixed any Known Errors. If we have – brilliant – tell the Service desk so they can let our customers know.

Final Points

I’d like to finish by saying that Change Management really is a critical process for managing controlling and protecting your live environment. As Change Managers we get to ask all the hard or awkward questions but on the bright side, we’re usually the first to know about cool new projects or sparkly new gadgets. Keep smiling, it will be fine. And if it isn’t, tweet me, I’m always happy to help!

Image Credit

Change Management – Surviving Implementation

253914822_f34c961bd6_z
The super power of a change manager is an “invisible shield”, just like Violet from The Incredibles

One of the things I’m getting asked about most this year is about getting the basics right – how to actually do change management in the real world. We all know that having good processes in place protect us all, ensures we meet regulatory guidelines and are generally just common sense, but what about using them so that we can build a better, stronger IT organisation? In this article, I’m going to talk about getting started and surviving the implementation phase. I’ll then follow it up with another article on how to actually run your change management process.

Let’s start from the beginning. change management sits in the transition stage of the service lifecycle. ITIL states that the objective of change management is “to ensure that changes are recorded, evaluated, authorised, prioritised, planned, tested, implemented, documented and reviewed in a controlled manner. In a nutshell, change management is about putting things in, moving things round or taking them out, and doing it safely and without setting anything on fire.

When describing the change process, I call change managers the guardians or protectors of our network. They ensure all changes are sanity checked, tested, reviewed, approved and scheduled at a sensible time. Their super power is an invisible shield (like Violet in “The Incredibles”) that protects the rest of the organisation from the adverse impact of change.

Getting started: Common Excuses and Ways Around Them

Change management is an incredibly important process because it enables you to manage, control and protect your live environment. Since the credit crunch, I’ve had more and more people coming to me saying that their change departments would either have to endure massive cut backs or stop improvement works. Here are some of the most common excuses I’ve come across for this along with some possible ways around them.

Excuse number 1: “We don’t have the time”. Ok, what about all the time wasted dealing with the impact of failed or unmanaged changes, firefighting incidents and dealing with the big angry mob camped outside the IT department waiting to lynch us for yet another mistake? Let’s be sensible, having a strong change process in place will lead to massive efficiency savings and the use of standard changes, models and templates will make the work involved repeatable.

Excuse number 2: “We don’t have the resources”. What about all the time spent going cap in hand to the rest of the business explaining why a key service was unceremoniously taken out by a badly executed change? Spin doctoring a major incident report that has to go out to external customers? I’d argue that you’re wasting resources constantly firefighting and if you’re not careful it will lead to stressed out departments and key individuals burning out from the stress of trying to keep it all together. Instead of wasting resources and talent – why not put it to good use and start getting proactive?

Excuse number 3: “We don’t have the money”. What about all the money spent on service credits or fines to disgruntled customers? Then there’s the less tangible side of cost. Reputational damage, being front-page news, and being universally slated across social media – not nice and definitely not nice having to deal with the fall out. Finally, what about compliance and regulatory concerns? Failing an audit could be the difference between staying profitable or losing a key customer.

Excuse number 4: “We can’t afford expensive consultants”. Ok, hands up. I used to be a consultant. I used to work for Pink Elephant UK and for anyone out there looking for an amazing consulting / training company then go with Pink – they rock. That aside, if you can’t afford outside help in the form of consultancy, you still have lots of options. Firstly, you have the itSMF. Again, I’m biased here because I’ve been a member, as well as a speaker for, and chair of, various sub groups and committees, all in an attempt to champion the needs of the IT service management community. Here’s the thing though, it’s useful war stories, articles, white papers and templates written by the members for the members. There’s also ISACA which focus more on the governance and COBIT side of things. There’s the Back2ITSM movement – lots of fantastic help support and information here. There’s the ITSM Review and blog sites from the likes of The IT Skeptic – lots of free resources to help you sort out your change Management process.

Excuse number 5: “I’m probably going to be made redundant anyway so what’s the point?” Yes, I am serious, this is an excuse I’ve come across. There’s no way to sugar coat it, being made redundant or even being put at risk is (to put it mildly) a rubbish experience. In that situation (and believe me, I’ve been there) all you can do is keep doing your best until you are told to do otherwise. Having a strong change management process can be a differentiator on responses to bids. Tenders as SOX compliance, or ISO 20000 accreditation can set you apart from competitors. Bottom line, we have to at least try.

Planning for Change Management

So how do you get started? First things first: you need to get buy in. Most management guides will tell you to focus on the top layer of management as they hold the purse strings, and that’s very true, but you also need buy in from your guys on the front line – the guys who will actually be using your process. Get their buy in and you’re sorted, because without it you’re stuffed.

So, starting with the guys at the top, you need to speak to them in their language and that means one thing – a business case! This doesn’t have to take forever and there are lots of templates out there you can use. The key thing is to explain clearly, in their language, why change management is so important. Things to cover in your business case are introduction, scope, options, deliverables and benefits. Now get your techies on board. There’s no “right” way of doing this. As someone with a few war stories to tell, things that have worked in the past include:

  • sitting down with your techies
  • templating everything
  • using the umbrella argument (more on that later)
157147622_3b79fa7cab_z
Krispy Kremes can help

I’ve also found that bribing support teams with doughnuts can be very effective, as a former techie I can confirm that Krispy Kreme ones work particularly well.

Once you’ve got your buy in, gather and confirm your requirements.  At the risk of playing management bingo here, a good approach is to set up workshops. Engage with both IT and the rest of the business so that there are no surprises. If you have an internal risk or audit department now is the time to befriend them! Using the aforementioned donuts as bribery if necessary, get their input as they will have the most up to date regulatory requirements you need to adhere to such as SOX or Basel 3.

Define the scope otherwise it will creep! Plan what you want to cover carefully. Do you want to cover all production equipment? What about test and DR environments? Whatever scope you agree, make sure it is included in any SLAs, OLAs or underpinning contracts so that you have documented what you are working to.

Keep your end users in mind

When writing your policy, process and procedures, keep your end users in mind. Don’t try to cover everything in red tape or people will find ways to circumvent your process. Let’s start with your policy. This is your statement of intent, your list of “thou shall” and  “thou shall nots”. Make sure it’s clear, concise and is in alignment with existing company standards. I know this might sound counterintuitive but also, prepare for it to be broken. It might sound strange but there will be times where something will need to be fixed in the middle of the night or there will need to be an urgent update to your website. It’s important that changes are raised in enough time for them to be reviewed and authorised, but exceptions will pop up so plan for them now when you’re not under pressure. Examples of when an emergency process could be used are:

  • Something’s broken or on fire (fixing a major incident)
  • Something’s about to be broken (preventing a major incident)
  • Major commercial reasons (in response to a move by a competitor)
  • A major risk to compliance has been identified (e.g. base rate changes, virus patches)

When looking at your process, make sure you have all the bases covered. This will include:

  • Recording and processing the change
  • Change assessment
  • Change Advisory Board (CAB)
  • Build and test
  • Implement
  • Review and close

I’ll talk about these in lots of detail in part two of this article.

Training & Communications

You’re about to go live with your sparkly new change management process and you want it to be a success so tell people about it! First, attend every team meeting, management huddle and town hall that you can get away with! Get people onside so that they know how much help change management can be and to reassure them they won’t have to go through lots of red tape just for the sake of it. Another way of getting your message out is to use posters. They’re bright, cheerful and cheap – here is one that I’ve used often.

2650056763_2a7cd6b746_z
Pelt front line teams with coloured balls if necessary! Not too hard though!

In terms of training you need to think about your change management team and your stakeholders, the people that will be raising changes using your process. For your change management team there are lots of practical courses out there that can help – a few examples could include:

  • ITIL Foundation
  • ITIL – Service Transition
  • ITIL – Release Control and Validation (RCV)
  • COBIT
  • SDI Managers Certificate
  • ISO 20000

Other important considerations include:

  • On the job training
  • Shadowing

But what about your front line teams who will be raising the changes and carry out the work? Again put some training together – make it interactive so that it will be memorable – in the past I have been pelted by brightly coloured balls by a colleague in the name of explaining change management so there really is no excuse for death by PowerPoint!

Things to cover are:

  • The process, its scope and the definition of a change
  • Raising a change record to include things like implementation plans, back out plans, testing, risk categorisation (“no it is not ok to just put medium”) and DR considerations
  • Templates & models
  • Benefits

I’ve done a fair few of these in my time so if you would like some help or examples just ping me on my contact details below.

Go Live

So you’re good to go. You’ve gathered your requirements, confirmed your scope, got buy in and have written up your policy, process & procedures. You’ve socialised it with support teams, ensured everyone has been trained up and have communicated the go live date. So deep breath time, go for it! Trust yourself, this is a starting point, your process will improve over time.

Metrics

I’ve written lots about metrics recently and have spoken about the basics in a previous article on availability, incident and problem management but in short:

You need to have a mission statement. It doesn’t have to be fancy but it does need to be a statement of intent for your team and your process. An example of a change management statement could be “to deliver changes effectively, efficiently and safely so that we put the customer at the heart of everything we do”.

Next come the CSF’s or critical success factors. CSFs look at how you can achieve your mission and some examples for change management could include:

  • To ensure all changes are carried out effectively and safely.
  • To ensure all changes are carried out efficiently, on time and with no out of scope emergency work.
  • To work closely with our customers & stakeholders to ensure we keep improving while continuing to meet their needs

Finally, we have Key Performance Indicators or KPIs. These give you the detail on how you are performing at the day to day level and act as an early warning system so that if things are going wrong, you can act on them quickly. Some example KPIs for change could include:

  • More than 98% changes are implemented successfully
  • Less than 5% of changes are emergency changes
  • Less than 10% of changes are rescheduled more than once
  • Less than 1% of changes are out of process

So you’ve survived your change process implementation – smile,  relax and take a deep breath because now the real work starts! Come back soon for part two of this article which will give you some practical advice on running your new change management process.

Image Credit 1

Image Credit 2

Image Credit 3

Availability, Incident and Problem Management – The New Holy Trinity? (part 2)

7961705128_66733257fb_z

Following on from part one, here are my next seven tips on on how to use availability, incident and problem management to maximise service effectiveness.

Tip 4: If you can’t measure it, you can’t manage it

Ensure that your metrics map all the way back to your process goals via KPIs and CSFs so that when you measure service performance you get clear tangible results rather than a confused set of metrics that no one ever reads let alone takes into account when reviewing operational performance. In simple terms, your service measurements should have a defined flow like the following:

Untitled1

Start with a mission statement so that you have a very clearly defined goal. An example could be something like “to monitor, manage and restore our production environment effectively, efficiently & safely”.

Next come your critical success factors or CSFs. CSFs are the next level down in your reporting hierarchy. They take the information held in the goal statement and break them down into manageable chunks. Example CSFs could be:

  • “To monitor our production environment effectively, efficiently & safely”
  • “To manage our production environment effectively, efficiently & safely”
  • “To restore our production environment effectively, efficiently & safely”

KPIs or key performance indicators are the next step. KPIs provide the level of granularity needed so that you know you are hitting your CSFs. Some example KPIs could be:

  • Over 97% of our production environment is monitored
  • 98% of all alerts are responded to within 5 minutes
  • Over 95% of Calls to the Service Desk are answered within 10 seconds
  • Service A achieves an availability of 99.5% during 9 – 5, Monday – Friday

Ensure that your metrics, KPIs & CSFs map all the way back to your mission statement & process goals so that when you measure service performance you get clear tangible results. If your metrics are linked in a logical fashion, if your performance goes to amber during the month (eg threat of service level breach) you can look at your KPIs and come up with an improvement plan. This will also help you move towards a balanced scorecard model as your process matures.

Tip 5: Attend CAB!

Availability, incident and problem managers should be key and vocal members of the CAB. 70%-80% of incidents can be traced to poorly implemented changes.

Problem management should have a regular agenda item to report on problems encountered and especially where these are caused by changes. Incident management should also attend so that if a plan change does go wrong, they are aware and can respond quickly & effectively. In a very real sense being forewarned is forearmed so if a high risk change has been authorised, having that information can help the service desk manager to forward plan for example having extra analysts on shift the morning of a major release.

Start to show the effects of poorly planned and designed change with downtime information to alter mind-sets of implementation teams. If people see the consequences of poor planning or not following the agreed plan, there is a greater incentive to learn from them and by prompting teams to think about quality, change execution will improve, there will be a reduction in related incidents and problems and availability will improve.

Tip 6: Link your information

You must be able to link your information. Working in your own little bubble no longer works, you need to engage with other teams to add value. The best example of this is linking Incidents to problem records to identify trends but it doesn’t stop there. The next step is to look at the trends and look at how they can be fixed. This could be reactive e.g raising a change record to replace a piece of server hardware which has resulted in down time. It could also be proactive for example “ we launched service A and experienced X, Y and Z faults which caused a hit to our availability, we’re now launching service B, what can we do to make sure we don’t make the same mistakes? Different hardware? More resilience? Using the cloud?”

You need to have control over the quality of the information that can be entered. Out of date information is harmful so make sure that validation checks are built in to your process. One way to do this is to do a “deep dive” into your Incident information. Look at the details to ensure a common theme exists and that it is linked to the correct Problem record.

Your information needs to be accessible and easy to read. Your audience sees Google and their expectation is that all search engines work in the same way.

Talk to people! Ask relationship and service delivery managers what keeps them awake at night and if there is know problem record or SIP then raise one.  Ask technical teams what are their top ten tech concerns. I’ve said it before and I’ll say it again. Forewarned it forearmed. If you know there’s an issue or potential for risk you can do something about it, or escalate to the manager or team that can. Ask the customer if there is anything they are worried about. Is there a critical product launch due? Are the auditors coming? This is where you can be proactive and limit risk for example working with change management to implement a change freeze.

Tip 7: Getting the right balance of proactive and reactive activities

It’s important to look at both the proactive and reactive sides of the coin and get a balance between the two. If you focus on reactive activities only, you never fix the root cause or make it better; you’ll just keep putting out the same fires. If you focus on proactive activities only, you will lose focus on the BAU and your service quality could spiral out of control.

Proactive actions could include building new services with availability in mind, working with problem management to identify trends and ensuring that high availability systems have the appropriate maintenance (e.g regular patches, reboots, agreed release schedules) Other activities could include identifying VBFs (more on that later) and SPOFs (single points of failure).

Reactive activities could include working with incident management to analyse service uptime / downtime in more granularity with the expanded incident cycle and acting on lessons learned from previous failures.

Tip 8: Know your VBFs

No, not your very best friends, your vital business functions! Talk to your customers and ask them what they consider to be critical. Don’t assume. That sparkling new CRM system may be sat in the corner gathering dust. That spreadsheet on the other hand, built on an ancient version of excel with tens of nested tables and lots of macros could be a critical business tool for capturing customer information. Go out and talk to people. Use your service catalogue. Once you have a list of things you must protect at all costs you can work through the list and mitigate risk.

Tip 9: Know how to handle downtime

No more hiding under your desk or running screaming from the building! With the best will in the world, things will go wrong so plan accordingly. The ITIL service design book states that “recognising that when services fail, it is still possible to achieve business, customer & user satisfaction and recognition: the way a service provider acts in failure situation has a major influence on customer & user perception & expectation.”

Have a plan for when downtime strikes. Page 1 should have “Don’t Panic” written in bright, bold text – sounds obvious but it’s amazing how many people panic and freeze in the event of a crisis. Work with incident and problem management to come up with the criteria for a major incident that works for your organisation. Build the process and document everything even the blindingly obvious (because you can’t teach common sense). Agree in advance who will coordinate the fix effort (probably Incident management) and who will investigate the root cause (problem management). Link in to your IT service continuity management process. When does an incident become so bad that we need to invoke DR? Have we got the criteria documented? Who makes the call? Who is their back up in case they’re on holiday or off sick? Speak to capacity management – they look at performance – at what point could a performance issue become so bad that the system becomes unusable. Does that count as down time? Who investigates further?

Tip 10: Keep calms and carry on

Your availability, incident and problem management processes will improve and mature over time.  Use any initial “quick wins” to demonstrate the value add and get more buy in. As service levels improve, your processes will gather momentum as its human nature to want to jump on the bandwagon if something is a storming success.

As your process matures, you can look to other standards and framework. Agile and lean can be used to make efficiency savings. COBIT can be used to help you gauge process maturity as well as practical guidance on getting to the next level. PRINCE2 can help with project planning and timescales. You can also review your metrics to reflect greater process maturity for example you could add critical to quality (CTQ) and operational performance indicators (OPIs) to your existing deck of goals, CSFs and KPIs.

Keep talking to others in the service management industry. The itSMF, ISACA and Back2ITSM groups all have some fantastic ideas for implementing and improving ITIL processes so have a look!

Final thoughts

I’d like to conclude by saying that availability, incident and problem management processes are critical to service quality. They add value on their own, but aligning them and running them together will not only drive improvement but will also reduce repeat (boring) incidents, move knowledge closer to the front line and increases service uptime.

In conclusion, having availability, incident and problem management working together as a trio is one of the most important steps in moving an IT department from system management to service management as mind-sets start to change, quality improves and customer satisfaction increases.

Image Credit 

Availability, Incident and Problem Management – The New Holy Trinity? (part 1)

So here’s the thing. We all know that incident and problem management, if working well, can reduce interruptions to the end user and improve service quality for the business. From an end user’s perspective though, availability is the name of the game. While most organisations have the basics covered with incident management, how many use problem & availability management to look at the underlying cause of Incidents at a service as well as a component level?

Working together effectively, availability, incident & problem management can improve both quality of service and the business perception of IT. Getting back to basics, incident management is a purely reactive process. We sort things out so that the business can carry on as usual. Problem management is both reactive and proactive. We look at what went wrong but also how to stop it from happening again. Availability management looks at all availability issues at both a component & service level, ensures that we consider availability at the point of service design as well as monitoring up time during normal operations.

When describing the three processes, I call incident management the superheroes of ITIL. They save the world several times a day, fighting fires and making people smile. Problem management are detectives. They get to the root cause and sort it out to stop the same issues from recurring. Availability management are the scientists of the ITIL world. Like the guys from The Big Bang Theory, they design the service to keep it up & running as much as possible based on user requirements.

Today, IT service issues are constantly in the news. With the advent of social media, news of service downtime can be spread globally in minutes – kind of embarrassing especially if you are a highly visible entity such as a bank or government department. Putting aside the embarrassment factor for a minute, what about financial implications such as fines, service credits? Or regulatory impact such as failing to comply with any standards mandated by your management. Lets not forget the angry mob waiting outside to make their dissatisfaction known if downtime is an own goal such as a poorly managed change. With this in mind, I’ve put together some tips on how to use availability, incident and problem management to maximise service effectiveness, with this article covering the first three of ten.

image1

Tip 1: Getting your facts straight

Have separate records for availability, incident & Problem Management. Incident Management records “fix it quick” should focus on getting the user details and a full description of the issue. Some of the information captured by Incident records could include:

batman

When managing an Incident, different support teams may need different views e.g.

  • Networks team – by location
  • Service desk – by customer satisfaction
  • Desktop support – by hardware
  • Development – by software application
  • Capacity management – by resource usage
  • Service delivery managers – by business impact
  • Change management – by date / time to compare with the change schedule

Problem management records focus on establishing the root cause and actions to prevent recurrence. Problem records can contain the following information:

prob

Availability records should look at planning for the appropriate level of availability and ensuring that availability & recovery criteria are considered when designing new services. Your availability plan should contain the following information:

avail

Tip 2: Identify roles & responsibilities

Be organised so there’s no duplication or wasted effort. In short the incident manager is concerned with speed, the problem manager is concerned with investigation and diagnosis and the availability manager is concerned with the end to end service.

Key priorities for the incident manager will include co-ordinating the incident, managing communications with both technical support teams and business customers, and ensuring that the issue is fixed ASAP.

The problem manager will focus on root cause investigation, trending (has this issue popped up before?), finding a fix (interim workarounds and permanent resolution) and ensuring that any lessons learned are documented & acted on.

The availability manager will look at ensuring the service is designed with the appropriate levels of availability, working with service operations to tackle issues at both a service and component level and using the extended incident cycle to look at trends and how the service can be improved.

Tip 3: Keeping up to date

It’s really important to keep an eye on the BAU as seeming small incidents can spiral out of control and have a negative effect on availability levels and customer satisfaction. Simple things can make a big difference for example, placing a white board near the service desk with a list of the top ten problems so that it’s easy for service desk analysts to link incidents to problems so that trends can be identified later on. If the service desk have a team meeting ask to attend and update them on any new problems as well as updates and workarounds on existing problems. Don’t forget to close the loop and let the service desk know when a problem record has been fixed and closed off, there’s nothing worse for a service desk to have to call a list of customers about an issue that was sorted out months ago!

Get proactive! Work as a team to view service availability through out the month. Have a process to automatically raise a new proactive problem record if availability targets are threatened so that things can be done to prevent further issues. Don’t just sit there waiting to fail the SLA!

 

In part two, I will continue with a further seven tips on how to use availability, incident and problem management to maximise service effectiveness.