Rob England: Incident Management at Cherry Valley, Illinois

It had been raining for days in and around Rockford, Illinois that Friday afternoon in 2009, some of the heaviest rain locals had ever seen. Around 7:30 that night, people in Cherry Valley – a nearby dormitory suburb – began calling various emergency services: the water that had been flooding the road and tracks had broken through the Canadian National railroad’s line, washing away the trackbed.

An hour later, in driving rain, freight train U70691-18 came through the level crossing in Cherry Valley at 36 m.p.h, pulling 114 cars (wagons) mostly full of fuel ethanol – 8 million litres of it – bound for Chicago. Although ten cross-ties (sleepers) dangled in mid air above running water just beyond the crossing, somehow two locomotives and about half the train bounced across the breach before a rail weld fractured and cars began derailing. As the train tore in half the brakes went into emergency stop. 19 ethanol tank-cars derailed, 13 of them breaching and catching fire.

In a future article we will look at the story behind why one person waiting in a car at the Cherry Valley crossing died in the resulting conflagration, 600 homes were evacuated and $7.9M in damages were caused.

Today we will be focused on the rail traffic controller (RTC) who was the on-duty train dispatcher at the CN‘s Southern Operations Control Center in Homewood, Illinois. We won’t be concerned for now with the RTC’s role in the accident: we will talk about that next time. For now, we are interested in what he and his colleagues had to do after the accident.

While firemen battled to prevent the other cars going up in what could have been the mother of all ethanol fires, and paramedics dealt with the dead and injured, and police struggled to evacuate houses and deal with the road traffic chaos – all in torrential rain and widespread surface flooding – the RTC sat in a silent heated office 100 miles away watching computer monitors. All hell was breaking loose there too. Some of the heaviest rail traffic in the world – most of it freight – flows through and around Chicago; and one of the major arteries had just closed.

Back in an earlier article we talked about the services of a railroad. One of the major services is delivering goods, on time. Nobody likes to store materials if they can help it: railroads deliver “just in time”, such as giant ethanol trains, and the “hotshot” trans-continental double-stack container trains with nine locomotives that get rail-fans like me all excited. Some of the goods carried are perishables: fruit and vegetables from California, stock and meat from the midwest, all flowing east to the population centres of the USA.

The railroad had made commitments regarding the delivery of those goods: what we would call Service Level Targets. Those SLTs were enshrined in contractual arrangements – Service Level Agreements – with penalty clauses. And now trains were late: SLTs were being breached.

A number of RTCs and other staff in Homewood switched into familiar routines:

  • The US rail network is complex – a true network. Trains were scheduled to alternate routes, and traffic on those routes was closed up as tightly bunched together as the rules allowed to create extra capacity.
  • Partner managers got on the phone to the Union Pacific and BNSF railroads to negotiate capacity on their lines under reciprocal agreements already in place for situations just such as this one.
  • Customer relations staff called clients to negotiate new delivery times.
  • Traffic managers searched rail yard inventories for alternate stock of ethanol, that could be delivered early.
  • Crew managers told crews to pick up their trains in new locations and organised transport to get them there.

Fairly quickly, service was restored: oranges got squeezed in Manhatten, pigs and cows went to their deaths, and corn hootch got burnt in cars instead of all over the road in Cherry Valley.

This is Incident Management.

None of it had anything to do with what was happening in the little piece of hell that Cherry Valley had become. The people in heavy waterproofs, hi-viz and helmets, splashing around in the dark and rain, saving lives and property and trying to restore some semblance of local order – that’s not Incident Management.

At least I don’t think it is. I think they had a problem.

An incident is an interruption to service and a problem is an underlying cause of incidents. Incident Management is concerned with the restoration of expected levels of service to the users. Problem Management is concerned with removing the underlying causes.

To me that is a simple definition that works well. If you read the books and listen to the pundits you will get more complex models that seem to imply that everything done until trains once more rolled smoothly though Cherry Valley is Incident Management. I beg to differ. If the customer gets steak and orange juice then Cherry Valley could be still burning for all they care: Incident Management has met its goals.

Image Credit

How to Provide Support for VIPs

One of the outcomes of IT Service Management is the regulation, consistency and predictability in the delivery of services.

I remember working in IT before Service Management was adopted by our organisation and realising that we would over-service some customers and under-service others. Not intentionally but we didn’t have a way of regulating our work and making our output predicatable.

Our method of work delivery seemed to be somewhere between “First come first served” and “She who shouts loudest shall get the best service”. Not the best way to manage service delivery.

Chris York tweeted an interesting message recently;

It’s a great topic to talk about and one that I remember having to deal with personally in previous jobs.

I have two different views on VIP treatment – I think it’s a complex subject and I’d love to know your thoughts in the comments below.

if your names not down you're not getting support
if your names not down you're not getting support

The Purist

Firstly IT Service Management is supposed to define exactly how services will be delivered to an organisation. The service definition includes the cost, warranty and utility that is to be provided.

Secondly, there is a difference between the Customer of the service and the User of the service. The Customer is characterised as the people that pay for the service. They also define and agree the service levels.

Users are characterised as individuals that use the service.

There are loads of great analogys to reinforce this point – from local government services that are outsourced (The local Government is the customer, the local resident is the user), to restaurants and airports. The IT Skeptic has a good discussion on the subject

It’s also true to say that the Customer might not also be a user of the service, although in organisations I’ve worked in it is usually so.

This presents an interesting dilemma for both the Provider and the Customer. Should the Customer expect more from the service than they originally negotiated with the Provider? I think the most common example that this dilemma occurs is end-user services – desktop support.

The people that would “sign on the dotted line”for the IT Services we used to provide would be Finance Directors, IT Directors, CFOs or CIOs. Very senior people with responsibility for the cost of their services and making sure the company gets a good deal.

Should we be surprised when senior people that ultimately pay for the service expect preferential treatment? No – but we should remind them of the service warranty that they agreed would be supplied.

Over-servicing VIPs has to be at the cost of someone else – and by artificially raising the quality of service for a few people we risk degrading the service for everyone.

The Pragmatist

The reality is that IT Service Management is a people business and a perception business, especially end-user services.

People call the Service desk when they want something (a Request) or they need help (an Incident). Both of these are quite emotional human states.

The performance and usability of someones IT equipment is fundamental to their own productivity and their own success. It feels very personal when your equipment that you rely on stops functioning.

Although we can gather SLA and performance statistics for our stakeholder meetings we have the problem that we are often seen as being as good as our last experience with that individual person. It shouldn’t be this way – but it is.

I’ve been to meetings full of good news about the previous months service only to be ripped to pieces for a request submitted by the CEO that wasn’t actioned. I’ve been to meetings after a period of general poor service and had good reviews because the Customer had a (luckily) excellent experience with the Service desk.

Much as we don’t like it prioritising VIP support it has an overall positive effect when we do.

The middle ground (or “How I’ve seen it done before”)

If you don’t like the Pragmatist view above there are ways to come to a compromise. Stephen Mann touched on an idea I have seen before:

Deciding business criticality is obviously a challenge.

In my previous role, in the advertising world, the most important people in an agency are the Creatives.

These guys churn out graphical and video content and work on billable hours. When their equipment fails the clock is ticking to get them back up and running again.

So calculating the financial cost of individuals downtime and assigning a role is a method of designating those that can expect prioritised support.

As a Service Provider in that last role our customer base grew and our list of VIPs got longer. We eventually allocated 5% of each companies headcount to have “VIP” status in our ITSM tool.

I think there are ways to write VIP support into an IT Services contract that allows the provider to plan and scale their support to cater for it.

Lastly, we should talk about escalated Incidents. This is a more “formal” approach to Service Management (the Purist would be happy) where a higher level of service is allocated to resolving an Incident if it meets the criteria for being escalated.

When dealing with Users it is worth having a view of that persons overall experience with the Service Provider. If a user already has one escalated Incident should she expect a better service when she calls with another? Perhaps so – the Pragmatist would see that although we file each Incident separately her perception of the service is based on the overall experience. With our ITSM suite we use informational messages to guide engineers as to the overall status of a User.

Simon Morris
Simon Morris

In summary…

I think everyone would agree that VIP support is a pain.

The Purist will have to deal with the fact that although he kept his service consistent regardless of the seniority of the caller he might have to do some unnecessary justification at the next review meeting.

The Pragmatist will have to suffer unexpected drain on her resources when the CEOs laptop breaks and everything must be focussed on restoring that one users service.

Those occupying the middle ground will be controlling the number of VIPs by defining a percentage of headcount for the Customer to allocate. Hopefully the Customer will understand the business well enough to allocate them to the correct roles (and probably herself).

The Middle Ground will also be looking at a users overall experience and adjusting service to make sure that escalated issues are dealt with quickly.

No-one said IT Service Management was going to be easy!