CHAPTER 4: NORTHGATE INFORMATION SOLUTIONS, A VICTIM OF THE BUNCEFIELD OIL DEPOT DISASTER – ROBERT CLARK

‘No one lost their life because of the Buncefield explosion. But many people lost the lives they had before it happened.’ – Jacqui Campbell.

In December 2005, the Maylands Industrial Estate at Buncefield near Hemel Hempstead in the UK housed some 630 businesses employing over 16,000 people. Located on the estate and positioned next to the Hertfordshire Oil Storage Limited (HOSL) was Northgate Information Solutions’ head office. It processed the payroll systems that paid approximately one in every three UK employees. It also provided other IT services to clients including local authorities, the British Labour Party and commercial organisations such as Tesco and Manchester United Football Club. Any failure of service delivery to its clients was considered unacceptable.

The fifth largest oil storage depot in the UK, HOSL, or Buncefield as it was often referred to, had a capacity of 60 million gallons (273 million litres). Operated by Total, it was linked by pipeline to other oil depots and refineries. It stored oil based products including petrol and kerosene, the latter being used to supply regional airports. Heathrow Airport, in fact, depended upon Buncefield for 50% of its daily intake of aviation fuel.

During the evening of Saturday 10 December 2005, an operation started to transfer unleaded fuel to Buncefield Tank 912 via the Thames/Kingsbury pipeline at a rate of around 550 m3/h. From approximately 3:00 am onward on 11 December, the gauge recording the level in the tank did not change. It is estimated that around 5:20 am the tank had reached its capacity and began to overflow. The safety system in place to shut off supply and avoid overfilling failed and the pumping operation continued. Moreover, between 5:50 am and 6:00 am, the pumping rate actually increased to 890 m3/h. The inevitable combination of the fuel and air resulted in a combustible mixture being created. This finally ignited at 6:01 am causing a series of explosions and fires which subsequently destroyed the oil depot while also resulting in serious damage to neighbouring Maylands Industrial Estate.

The resulting disaster is estimated to have cost in excess of £1 billion. Among the first firemen to arrive was Sub Officer Jon Batchelor who quickly declared the scene a major incident. It was probably the biggest fire in Europe since World War II and the largest of the explosions registered at 2.4 on the Richter scale. Amazingly no one was killed but 43 people were injured and widespread damage was caused to nearby homes and businesses. Several large businesses were affected by the incident including Fujifilm, ASOS and DSG. Others that were not physically damaged went bankrupt because they were located in the exclusion zone imposed around Buncefield by the emergency services and were unable to operate. Twelve months later, the redundancy count at the Mayfield Industrial Estate directly attributed to the disaster had reached 900.

In the middle of this destruction was the now decimated Northgate Information Solutions head office. It was extremely fortuitous that only four members of staff had been in the building instead of the 500 plus employees who were based there during the working week.

More than 2,000 local residents were evacuated and local roads and motorways were closed along with many local schools, as fears about the potentially harmful effects of the smoke increased. With 50% of its daily fuel supply cut off, Heathrow Airport had to ration supplies to aircraft. Some aircraft were diverted to Stansted or other European destinations while short haul flights were asked to refuel before arriving at Heathrow.

Despite an al-Qaeda threat against oil refineries and storage depots issued only four days prior to the Buncefield disaster, a 9/11 style plane crash or a conventional terrorist attack was ruled out very early in the investigation. The official inquiry’s final report endeavours to explain why the petrol transfer operation was not halted when the receiving tank had reached capacity. Volume two of the final report describes how the Supervisory Control and Data Acquisition system (SCADA) should have turned off the petrol flow, although it acknowledged that local controls can override this’. Nowhere in the report can any reference be found to the possibility of a cyber attack being considered as the primary cause of the overflow. Prior to Buncefield, several incidents had occurred which have been referred to as cyber physical events. Some resulted in the actual destruction of oil and gas pipelines. Reports of compromised SCADA systems now abound including the very high profile Stuxnet attack on the Iranian nuclear programme. Furthermore, some six years after Buncefield Shell IT Manager Ludolf Luemann warned of the danger of cyber attacks on the oil industry, remarking that:

‘If anybody gets into the area where you can control opening and closing of valves, or release valves, you can imagine what happens. It will cost lives and it will cost production, it will cost money, cause fires and cause loss of containment, environmental damage – huge, huge damage.’ – (BBC News, 2011).

Before the first explosion occurred, a vapour cloud was seen on CCTV drifting across the adjacent industrial estate. There were in fact several explosions but the main one was colossal and appears to have been centred on the Northgate car park.

The Fire Brigade treated Northgate as a separate incident and had the explosion occurred during office hours, extensive fatalities would have been almost inevitable. The force of the blast was such that a fireproof safe located at the back of the building on the second floor was recovered from the front of the building on the ground floor. Fire and Rescue Service crews from St Albans and Redbourn conducted an initial search of the premises. One crewman described the interior of the Northgate building:

‘It looked as if you had picked it up, shaken it on its side and then put it back down very carefully.’

Disasters do not always happen between the hours of nine to five, from Monday to Friday, and Buncefield was no exception. Quickly invoking its well-rehearsed recovery plans, the Northgate crisis management team met within three hours of the disaster and immediately activated their data centre fall-back contract with SunGard, who offer managed recovery services. Client data was retrieved from the secure storage facility and restored to the SunGard servers.

Northgate’s reaction

With many of Northgate’s client systems being payroll, recovery priorities needed to be set keeping in mind that, by its very nature, payroll is cyclic and recovery time objectives will vary during that cycle. Within two days the first payroll run was successfully completed. Over 200 of Northgate’s clients were directly affected and around 100 of their consultants worked around the clock in twelve-hour split shifts over a ten day period to resolve the crisis. What was also extraordinary was that during the recovery, Northgate’s staff still found time to counsel concerned clients about their own contingency plans. While acknowledging the enormous staff commitment, Business Recovery Director Mark Farrington also stated:

‘Had we lost any of the thirty core support staff that knew the systems best, we would have been stuck.’ – (Information Age, 2006).

It is not uncommon to see statements in business continuity plans which make assumptions that all of an organisation’s staff will be available to support a recovery. This even fails to acknowledge legitimate absenteeism for staff holidays, sickness or for any other acceptable reason. While Northgate’s reinstatement of its clients’ services seemed to proceed comparatively smoothly, had the explosion occurred during office hours it is highly likely that key staff would have been injured or killed. Although comprehensive documentation was available, which is of paramount importance in a recovery, of equal import is relevant knowledge and know-how held by staff. Moreover, it is quite likely that some employees that were perhaps not physically injured would have been traumatised. In the case of the 1996 Manchester IRA bombing, some employees whose company premises were caught in the blast were reported to be still receiving trauma counselling two years later even though the entire area around the vehicle borne improvised explosive device had been evacuated before the blast. It is sensible that every business continuity plan should make provision for trauma management and counselling plus informing next of kin in the event of injuries or fatalities. Farrington also flags this point and stressed that staff welfare and counselling services should be considered as an option.

In fact, Northgate had planned and rehearsed for what they considered to be a worst case scenario – ‘apocalyptic’ seems an appropriate word to describe the disaster. But while their recovery might be considered textbook, in his appraisal of the recovery Farrington cited several ‘lessons learned’. Of particular note are his recommendations pertaining to staff and testing for multiple scenarios that should include loss of personnel. Testing strategies should always allow for situations that include the loss of employees, which could be for any one of a number of reasons such as a fire, explosion or pandemic. Some organisations factor in succession planning when developing their testing scenarios. But to be comprehensive, a succession plan needs to look at all levels of an organisation and not just the top echelon. Somewhere in their lower ranks may exist individuals who are of key importance to that organisation. Consequently, Farrington’s point about the potential impact of losing any of the core staff is well made.

Planning where to locate its staff both during and post recovery seems not to have received quite the same level of attention as the disaster recovery itself, another point not lost on Farrington. His post-disaster recommendation that adequate provisions are included in the plan to provide key staff with an alternative working location is sound. This was particularly emphasised by the fact that, seven weeks after the incident, around 10% of Northgate’s staff still had not been relocated. Some employees were accommodated in hotels while others worked from home. When offices were finally located for the finance and administration staff, however, suppliers were very supportive and reacted quickly to help fit out the new premises. The goodwill built up over many years certainly paid off.

As part of any business continuity plan, it is advisable not only to identify your key members of staff, but also to understand how quickly you need to get them back to work and how you intend to achieve that. The urgency that is applied to this activity needs to be driven by the pre-determined recovery time objectives that an organisation is working towards.

The majority of the Northgate’s clients’ services were restored in December and 180 client payroll runs successfully transferred £1.4 billion via BACS within eight days of the disaster. The incident also disrupted Addenbrooke’s Hospital in Cambridge as Northgate supported the system used for patient admissions and discharges. Fortunately, the hospital had contingency plans in place and switched over to a manual system. That said, it should be appreciated that as a first responder the hospital is regulated by the UK Civil Contingencies Act 2004 and must have a valid business continuity plan in place.

As part of building a resilient IT capability, Northgate had invested heavily in redundancy including an in-house high availability solution. This was supported by dual backup generators plus an uninterruptable power supply and three internet service providers. Northgate’s data centre plus its in-house redundancy was destroyed in the carnage along with its entire desktop environment. Also destroyed were voice and data communications, including the call centre for which no redundancy arrangements had been made.

Having a dependable fall-back partner already in place was a crucial aspect of the business continuity strategy. In this respect Farrington recognises SunGard’s significant contribution, referring them as a ‘reliable and capable recovery partner.’ CEO Chris Stone echoed these sentiments and said that SunGard had ‘acted in an exemplary fashion.’ What is not clear is how efficiently and effectively the desktop environment was recovered, particularly as the workforce was initially rehoused using multiple sites, including some staff being based at home.

Farrington points out that recovery plans may just assume that only one client or one system has been affected by an incident. But in Northgate’s experience, everything needed to be recovered. While recovery time objectives for restoring an individual client or system may be achievable, meeting a combination of similar or even identical objectives may prove impossible if there are insufficient numbers of skilled employees available to accomplish this. It is conceivable that more staff may be required than for normal business operations if recovery time objectives are to be met.

The recovery of data by Northgate was hampered as the overnight backups created immediately before the blast were lost. They were not due to be removed to an off-site storage facility until one hour after the explosion occurred. No explanation could be found in terms of how this data was subsequently recovered – one can only assume that it was. Nevertheless, given the circumstances, it appears that there was really very little Northgate could have done that would have mitigated this situation short of having previously adopted a potentially expensive mirroring solution as part of a data management strategy.

The only serious technical issue identified was the time needed to restore full connectivity. Northgate’s three suppliers, BT, NEOS Networks and Cable & Wireless, struggled to quickly restore a comparable level of service to the SunGard site. Although most clients’ systems were live again within a few days of the explosion, one month on only 50% of Clients had had their telecoms restored to the pre-disaster level. Even so, it is unclear how long it took to restore the status quo, but we do know it took more than a month. Had a testing exercise with suppliers included a scenario for redirecting connectivity, it may have provided Northgate with early warning of the probable elapsed time issues that they ultimately experienced. Nevertheless, the cost of such a test may well have been considered prohibitive. The only potential solution to this is to ‘pre-wire’ a warm or hot standby site. This is something that would need to be considered as part of the business continuity budget at Northgate’s disposal. Farrington’s telecoms concern was shared by CEO Chris Stone.

When reviewing the recovery progress made by 2007, Northgate CIO John Lockett summed it up by saying:

‘The situation Northgate is in now is one where the company has come out of a spell in intensive care, is out of hospital and is recuperating.’ – (Lockett, 2007)

Farrington also recorded eleven other aspects of the overall activity that worked extremely well during the recovery. He specifically felt that the ‘Business Continuity, Disaster Recovery and Crisis Management plans and their rehearsal were invaluable.’ After normal resumption of operations, they took the lessons learned from Buncefield and revisited the business continuity plans for their other sites. Moreover, Northgate remained vigilant throughout the recovery period. Penetration testing of the restored systems continued on a ‘business as usual’ basis to guard against any opportunist cyber attacks. What remained of their Buncefield premises was protected 24/7 to facilitate salvage of all surviving key assets while discouraging any would-be ‘disaster tourists’.

Communications

During the recovery, communications were quite correctly seen to be of vital importance. This was a multi-faceted exercise. Northgate quickly issued a statement to the City aimed at reassuring stakeholders and the financial markets. Moreover, an 0800 number phone line was set to communicate the message of the day. Northgate’s CEO Chris Stone made videos to communicate with the staff although it is not clear how these were distributed and viewed. Account managers were responsible for appraising clients. With all their careful business continuity preparation it is perhaps astonishing that Northgate had not created a communications plan in advance of any possible disaster.

‘Four hostile newspapers are more to be feared than a thousand bayonets.’ – Napoleon Bonaparte.

One of the most critical aspects of communication during a disaster is addressing the media. A wrong word here or there can make a substantial difference as to how the story is reported or what messages are sent out. In the case of the 2010 Deepwater Horizon oil spill in the Gulf of Mexico, BP’s CEO Tony Hayward said on camera that ‘I’d like my life back’. The incident had claimed eleven lives and Hayward’s remarks were nothing short of a public relations disaster. Little evidence can be found of how well Northgate performed in front of the media. It must be remembered, though, that the plight of this one company was undoubtedly very much overshadowed by the scale of the overall disaster. Northgate to all intents and purposes was a comparatively small disaster within a far bigger disaster. Conversely, HOSL received praise for responding quickly to support those affected by the incident as a goodwill gesture. They provided a round-the-clock counselling service, funding for voluntary organisations that were helping the local Buncefield residents, and organised emergency repair work for properties damaged by the blast.

Recovery of the Year

As for Northgate’s clients, it was reassuring to learn that at least Addenbrooke’s Hospital did not rely solely on Northgate for its patient system and had taken ownership of appropriate manual contingency measures. Since other clients were seeking contingency related advice from Northgate, however, it can only be assumed that they were not so well prepared in this respect.

In all, more than half the 630 businesses on the Mayfield Industrial Estate were disrupted by finding themselves inside the initial exclusion zone. A total of six buildings on the estate, including Northgate’s, were condemned and subsequently demolished. A further 30 needed major repairs to be undertaken before they could be reoccupied. It was not possible to quantify the knock-on effect that the disaster would have on the supply chain although it was estimated the estate’s contribution towards the regional GDP was around 2%. Five weeks on, 88 businesses had still not been able to return to their premises.

The disaster resulted in thousands of insurance claims of which 749 were business related and over 3,000 were raised by individuals. Northgate’s own financial liability was limited by its insurance cover which included business interruption insurance, building contents and cover for the head office building itself. The company set up a team to handle the claim for losing its head office building and, along with the emergency services and local authorities, Farrington stated that Northgate enjoyed a ‘terrific response’ from the insurers. By May 2006 the insurers had agreed to write a cheque to the value of £30 million to cover the cost of constructing a new head office.

It would be difficult to argue that the oil industry’s safety track record is an impressive one. Far from it! Being located close to an oil depot or an oil refinery is not good news as it only serves to increase the threat profile that an organisation is facing. Despite the claims of Chris Hunt, Director General of the UK Petroleum Industry Association, the Buncefield disaster was not unprecedented. There had been similar incidents at Newark, New Jersey in 1984, St Herblain, France, and Naples, Italy in 1995 . Moreover, in another case of overfilling by BP in Texas in 2005, 15 fatalities resulted. What was unique about Buncefield was the scale of disaster.

What Northgate was confronted with on that December morning was almost as close to a worst case scenario as one could get. Without question, its staff magnificently rose to the challenge they were presented with. So, when the 2006 Business Continuity Awards were announced, Northgate’s achievement was appropriately recognised when it received the accolade of ‘Most Effective Recovery of the Year’. This was well deserved and the recognition received should act as a lesson and inspiration to other, less prepared organisations.

There is no question that Northgate and its staff should be immensely proud of what they achieved. A job very well done. But while not wishing to take anything away from what was accomplished, is this really a good example of a textbook business continuity case study? When you consider the acknowledged shortcomings in Northgate’s plans, the whole exercise appears to have been an IT-driven rather than a business-driven recovery. In other words, was it the tail that was wagging the dog? Without question, the IT disaster recovery element of the activity went remarkably well and the commitment to testing and exercising clearly paid off. But the recovery of Northgate’s actual business including the relocation of the staff seemed to be made up as they went along.

What is particularly obvious was that Northgate took its responsibilities as an IS supplier very seriously along with its obligation to its clients. This commitment was reflected by client confidence in Northgate. While it lost two small accounts immediately after Buncefield, by way of compensation it won around £100 million worth of new contracts.

Lessons learned

Buncefield was registered as a hazardous site and was subject to the Health and Safety Executive’s Control of Major Accident Hazards (COMAH) regulations. One must question the wisdom of siting either commercial or domestic premises near to COMAH sites because of the greater risk this presents to the properties and their inhabitants. Moreover, the increased risk will invariably not go unnoticed by insurers and will usually be reflected by increased premiums. Planning authorities would be wise to take account of this and resist the development of land in close proximity to such hazardous sites. Northgate quickly took note following the disaster and, while still located in Hemel Hempstead, the new head office is now almost one mile (1.6 km) away from Buncefield.

What went well

  • The immediate reaction of the Northgate crisis management team, initiating the recovery process within three hours on a non-working day, was impressive. Even CIO John Lockett, who was holidaying in Brazil, had joined the CMT within 24 h.
  • The gargantuan commitment from staff.
  • The IT disaster recovery aspect of the operation went exceptionally well, with the first client payroll run completed within 48 h. The company acknowledge that regular testing of the plan had certainly helped raise the level of awareness and competence amongst employees.
  • Northgate had clearly recognised that it was necessary to complement their on-site ICT capability, including the in-house redundancy, with the extra layer of resilience that SunGard offered. The company found SunGard to be ‘a reliable recovery partner’.
  • Data backups were kept in a secure, off-site location thereby facilitating a fast recovery.
  • The ‘terrific response’ from the emergency services, local authorities and Northgate’s insurers. It is said that you find out how good your insurer is when you need to make a claim. Clearly, Northgate had chosen well.
  • Northgate’s reputation as an IS supplier remained intact. Not only did all but two (small) clients demonstrate their confidence by remaining with the company, but an extra £100 million worth of additional contracts was won shortly afterwards.
  • The continued level of vigilance for both information security and physical security to deter hackers and would-be disaster tourists.
  • At least one of Northgate’s clients, Addenbrooke’s Hospital, recognised its obligation to maintain accountability for its business continuity arrangements and successfully activated its contingency plans.

What could have been done better

  • The communications plan was a case of ‘it will be alright on the night’ as no preparation had been done prior to the disaster. Even so, the information available pertaining to Northgate’s performance in this respect would suggest that they did a good job.
  • No provision had been included in the plan to address any staff losses. Only the timing of the event saved Northgate from a potentially catastrophic loss of life which may have robbed it of the key employees needed for the recovery.
  • Clearly defined roles and responsibilities need to be in place before a disaster with appropriate levels of governance.
  • Methods to address employee welfare issues and trauma counselling should be an integral part of the plan.
  • Testing should accommodate multiple-scenario situations rather than single events. In the case of Northgate, everything needed to be recovered.

What did not go well

  • The re-establishment of the pre-disaster level of online service to clients took well over a month despite the involvement of three ISPs.
  • The relocation of Northgate’s employees seemed very ad-hoc with the plan being made up after the catastrophe occurred. This resulted in some employees having to wait weeks to be relocated. While the company strived to meet the recovery time objectives for ICT, the recovery of the non-ICT aspects of the business were just not in sync.
  • There was no backup in place for voice and data communications.

Other observations

  • The creation of an exclusion zone is done at the discretion of the emergency services. They decide how big the zone should be and when the restrictions can be lifted. Businesses untouched by the explosion went bankrupt as they found themselves within the exclusion zone and had no plans for dealing with such a scenario. Exclusions zones may last for just a few hours or for a few days. In the case of the 1996 Manchester bombing, an exclusion zone was in place for several weeks.
  • Northgate acted upon lessons learned and ensured they were reflected in the plans for all of its premises.
  • Locating a business close to a COMAH site brings with it additional risks to both business and employees and may attract higher insurance premiums.

Conclusion

Perhaps the summary of this major incident is best left to the Business Continuity Institute’s Technical Director, Lyndon Bird. He observed that the incident happened at the most inopportune time when demand for oil based products traditionally peaks. It is perhaps disappointing that he makes no mention of the potential human tragedy that could have unfolded had the timing of the event been different. Should the explosion have occurred during the working week, the fatality count could have been appalling in Northgate alone. One can only speculate how much more difficult their recovery would have been had key staff been lost. Byrd adds that:

‘It [Buncefield] provides more messages and lessons than all the other annual incidents put together. This is what Business Continuity is all about.’

Northgate’s recovery was solid, and allowed the company to survive the disaster. In one sense they were lucky, however, because their recovery plans did not fully take into account areas other than IT. The risk to life posed by the massive oil facility in the area was not factored into the plans, nor was the need to communicate effectively with the media. Fortunately, the company has survived and used the Buncefield incident to hone their response while taking positive steps to reduce the potential for disaster, for example by moving their headquarters. They have rebalanced and improved their recovery approach by giving greater consideration to the areas where they were weakest.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset