Chapter 9

Is Resilience Really Necessary? The Case of Railways

Andrew Hale

Tom Heijer

Introduction

In other parts of this book resilience is discussed as a desirable attribute of organisations in managing safety in complex, high-risk environments. It seems to us necessary to explore also the boundaries of resilience as a strategy or state of organisations. Is it the only successful strategy in town? If not, is it always the most appropriate strategy? If not, when is it the preferred one? This chapter tries to address some of these questions, to give some thoughts and ways forward.

Railways as a Running Example

We take as our example the railways. We have been involved intensively in the last few years in a number of projects concerning safety management in the Dutch and European railways, partly sponsored by the European Union under the SAMRAIL and SAMNET projects (www.samnet.inrets.fr for official documents and reports of the project), and partly by the Dutch railway regulator (Hale et al., 2002). These have addressed the issues of the design of safety management systems for the increasingly privatised and decentralised railway systems in the member countries. In particular, our work has been concerned with the role of safety rules in risk control, and the methods of managing rules in the separate organisations and in the whole interacting railway system (Larsen et al., 2004; Hale et al., 2003). The information that we gathered from interviews, workshops and the study of documentation was designed to produce a best practice guideline for safety rule management. However, safety rules and how an organisation deals with them are such central issues for an organisation that we believe that the data can throw useful light on the question of organisational resilience. So we have recast the information we have and used it to reflect on the questions in this book. Inevitably there are gaps in what we can say, because the data were not originally collected to answer the questions posed here. However, we hope they prompt some thought and discussion and may even point out ways forward in both theory and practice.

The railway industry is particularly interesting in this context, since it prides itself, based on European Union figures (see Table 9.1) on being the safest form of transport in terms of passenger deaths per person kilometre (equal with aviation), or in deaths per person travel hour (equal with buses and coaches). In this sense it would like to see itself as an ultra-safe activity in Amalberti’s terms (Amalberti, 2001). However, if we look at its record on track maintenance worker safety, its performance in the Netherlands is at or below the average of the construction industry in terms of deaths per employed person per year. The passenger train companies have a record of problems related to the safety of staff, particularly conductors and ticket controllers, where violent assaults by passengers resisting paying their fares are a problem that has been increasing in recent years. The railways are also one of the favourite mechanisms chosen by people in the Netherlands wishing to kill themselves (over 200 successful attempts per year) and it kills some 30 other transport participants per year at level crossings, as well as a number of trespassers.

These different safety performance indicators are looking at very different aspects of the railways and their management, some of which are only partly within the control of the participating companies, at least in the short term. For example, the risk of road traffic crossing railways can be greatly improved in the long term by eliminating level crossings in favour of bridges or underpasses. However, this would require, for such flat countries as the Netherlands, a very large investment over a long time. Although this is being actively worked on, it has to be coordinated with, and some of the money has to come from, the road managers. We will therefore concentrate here on the two aspects that are most within the control of the railway companies and their contractors, namely passenger safety and the safety of track maintenance workers. By looking at these we want to pose the question of whether railways are an example of a resilient system or not? Do they achieve their high level of passenger safety by using the characteristics that are typical of resilient organisations and systems, or is their success due to another strategy? Does the poor performance on track worker safety arise from a lack of resilience? The majority of the information comes from our Dutch study, but some is also used from the European cases.

Table 9.1: Comparative transport safety statistics 2001/2002 (European Transport Safety Council, 2003)

Deaths/100 million person kilometres

Deaths/100 million travel hours

Road: total, of which:

0.95

28

Motorcycle/moped

13.8

440

Foot

6.4

25

Cycle

5.4

75

Car

0.7

25

Bus/coach

0.07

2

Ferry

0.25

8

Air (civil aviation)

0.035

16

Rail

0.035

2

Origins of the Data Used. The safety rule study was conducted using a protocol of questions based on the following framework of steps (Figure 9.1), which we expected to find carried out in some shape or form in the railway system, both within the different companies, and across the interfaces where they have to cooperate. The case studies concentrated on the maintenance of infrastructure, as it is one of the most complex and dangerous areas where safety rules play a big role.

The interviews were based on this framework and were conducted with a range of informants, who worked for the infrastructure company, a train operating company, several track maintenance companies, the traffic control and the regulatory bodies. The topics covered by the framework range over the link to the fundamental risk assessment process and the choice of preventive strategies and measures, before diving more deeply into the way in which rules and procedures are developed, approved, implemented and, above all, monitored, enforced or reviewed. The interviews therefore gave a broad coverage of many aspects of the safety management system and culture of the organisations concerned, as well as the communication and coordination across the organisational boundaries that make up the decentralised railway system.

Image

Figure 9.1: Framework for assessing the adequacy of management of safety procedures and rules (Larsen et al., 2004)

‘Rules and procedures’ were also interpreted broadly, to include everything from laws and regulations, through management procedures down to work instructions. We therefore ranged broadly in our interviews from policy makers in Ministries down to track maintenance supervisors. The observations that come out of the study can therefore be expected to give a good view of the resilience of the railway system.

A number of the researchers working on the Dutch and European projects were familiar with other high hazard industries, notably the chemical process, steel, nuclear and oil and gas industries. For a number of these researchers the SAMRAIL project was a first attempt to come to grips with the railway industry and its safety approaches and problems. The frame of reference they had, and which they expected to find in the railways, was conditioned by the very explicit requirements of the process industries. These included:

•  a documented safety management system, subject to regular audit against extensive normative descriptions of what should be found in a well-managed system, and

•  safety cases making explicit the major hazard scenarios and arguing that the levels of safety achieved by specified preventive measures, backed up by the safety management systems, are acceptable, often in comparison to quantitative risk criteria.

The study of the safety rules management system in railways produced a number of surprises, when compared with this background expectation. They are presented in the following section, largely in the form that they manifested themselves in the SAMRAIL study, notably in response to the questioning under boxes 1 and 2 and to a lesser extent boxes 6-8 in Figure 9.1. In the section on ‘Assessing Resilience’ we return to the insights presented, but now with spectacles of ‘resilience’ as it was defined at the workshop.

Observations on Safety Management in Railway Track Maintenance

Explicit Models of Accident Scenarios and Safety Measures

We were surprised to find that none of the railway organisations involved (the infrastructure manager, the train companies, but also not the regulator) could provide us with a clear overall picture of the accident scenarios they were trying to manage. Safety seemed to be embedded in the way in which the system ran and the personnel thought, in a way which appeared at first to resemble descriptions of generative organisations (Westrum, 1991; Hudson, 1999). The informal culture was imbued with safety and much informal communication took place about it. Yet there was no overall explicit model of how risk control was achieved. As part of our project, we had to develop a first set of generic scenarios, and barriers/measures used to control these scenarios, in order to analyse the part played by safety rules and procedures in controlling them. This situation is in marked contrast to what is found in the nuclear, oil and gas and chemical process industries (or in the US defence industry), where companies and regulators have worked for many years with explicit scenarios. These concentrated in the early years (1970s to 1990s) more, it is true, on the stage of the scenario after the loss of control or containment, rather than on the preventive stages before this occurred. However, prevention scenarios have become a standard part of the safety cases for these industries for many years.

A similar surprise had occurred when the Safety Advisory Committee at Schiphol airport was set up in 1997 in the wake of the Amsterdam Bijlmer El-Al Boeing crash (Hale, 2001). Requests to the different players at the airport (airlines, air traffic control, airport, customs, fuel and baggage handling, etc.) to supply an explicit description of the risks they had to control and how their safety management systems did this, notably how they interacted with each other in this control activity, led to long silences and blank looks, or statements that ‘they were working on them’. Only some 8 years later is there significant progress in scenario and risk modelling at Schiphol, and this is still far short of what is desirable (VACS, 2003; Ale et al., 2004). A similar lack of an explicit model for air traffic control safety can be inferred from studies of that activity (Pilon et al., 2001; Drogoul et al., 2003). The Dutch railways would appear to be lagging even further behind in this respect.

Communication Channels

We were struck by the extreme centralisation of the system and the way it has consistently discouraged local communication, e.g., between train drivers and maintenance gangs on the tracks they are running on, and even direct (e.g., mobile telephone) communication between train drivers and traffic controllers. Communication is still seen as something that should occur primarily via the track signals set by the controllers. Figure 9.2 is a diagram of the communication channels in the railway system at the time of our study. This shows the lack of fast and direct channels between different parts of the system. This paucity of communication channels, and the limited information available to the controllers (e.g., only which block a train is in and not where in that block and what it is doing – riding or stationary with a problem) means that the system operates to a considerable extent in an open-loop mode.

Image

Figure 9.2: The railway communication structure (Hale et al., 2002)

The safety and other rules then take on a particular role as feedforward controls. They provide a degree of certainty about who will be doing what, where and when. Of course this certainty is only there if the rules and plans are strictly adhered to; hence the great emphasis on rule knowledge and obedience, which is characteristic of the railway system. We interpreted the discouragement of communication as the desire to stop rules being altered on-line outside the knowledge and approval of the controllers. Yet our interviews show widespread need for local adaptation, since many rules are locally impossible to apply as written. This is confirmed by many other studies of railway rules, e.g., Elling (1991), Free (1994), and Reason et al. (1998). The lack of communication channels to achieve this on-line adaptation is one factor leading to widespread routine violations.

It is perhaps understandable from a historic context why central control and communication through the track signals grew up. There were, before the advent of cheap and effective mobile telephony, few possibilities for distributed communication. The only option was fixed signals, with local telephone lines between distributed signal boxes. However, technology has now far more to offer than the railway system has been willing or able to accept.

We can contrast this lack of on-line communication about rule or schedule modification with the situation described by, for example, Bourrier (1998) in her study of nuclear power plants. She described the presence of close and effective communication channels between maintenance workers and rule-makers to allow for rapid rule modification. Her work was inspired by the High Reliability Organisation school at Berkeley (Weick & Roberts, 1993; Weick, 1995) which has, as one of its central findings, the importance of many overlapping communication channels and checking procedures as a guarantee of flexible and effective operation in high-hazard environments, in other words of resilience.

Operating Philosophy

When we consider operating philosophy, one of the most important issues is whether control is centralised or not.

Centralised Control. We concluded that this highly centralised method of control means that the system copes with threats to its process safety by operating an ‘exclusive’ regime. No train or work is officially allowed to proceed unless the requirements for safety are guaranteed to be present according to the rules. This is the essence of the block-working system common to all railways and developed after many years of trial and error involving train collisions (e.g., Holt, 1955). This assigns a defined section of the rail track to a single train and does not allow a following train (or opposing train on single line working) to enter it until the preceding train has left the block. This ensures, if all the rules are obeyed and nothing in the hardware fails to danger, that there is always a defined minimum distance between moving trains. Stop signs (signals) protect the block, and these must stay red until it is clear. The following train must wait at the red signal, even if it strongly suspects that the sign is defective. (This is in stark contrast to the behaviour of road traffic at traffic lights, where red lights, at least in the Netherlands, are treated by cyclists and pedestrians, and increasingly by drivers, as pretty coloured indicators to proceed with caution. The high level of road accidents may seem to show that this ‘laxity’ in road operations leads to low safety. However, the much higher traffic densities on the road should make us ask whether the relative safety is in fact lower – see Chapter 3 on definitions of resilience.) Only after communication with the controller, through the often-slow communication channels mentioned above, is permission given to proceed through a red signal. In other words, safety of the train in respect of avoidance of collisions is currently bought at the cost of a ‘stop and restart’ philosophy. If the system moves outside a certain expected and defined safe envelope, i.e., deviates from planned operation, the system has to be brought to a halt in parts at least, so that action can be taken to remove the threat, reposition the operating elements and then restart the system. No plans are made on a routine basis to cope with operation outside this envelope. The controllers therefore have to work out on the spot, or with a time horizon of only a few minutes, what to do in such cases. This often means that they are not able to do it fast enough to ensure smooth continuation of operations without introducing delays. This improvisation is in marked contrast to the planning of the normal timetable, which starts around a year before the operations take place. This approach can cost a great deal in punctuality of running, if things begin to slip in the timetable, due to unplanned delays, and trains have to wait at unprogrammed points before they can occupy critical parts of the network.

A future development, on which the railway industry has been working for many years, is to define the protected blocks not in terms of the fixed infrastructure, but in terms of a fixed envelope around a moving train. This could make the blocks smaller, but does not inherently increase the flexibility. The driving force to develop and use this technology is, therefore, to increase the capacity of the network, by allowing trains to run more closely one after the other.

Towards Decentralisation? Aviation has similar rules allocating slots and route segments to aircraft, particularly at airports and at busy parts of the network, but these are already related to the moving aircraft itself and not to fixed segments of the infrastructure. In this system there are also new developments, but these would represent a much greater revolution in operating philosophy, which might move that system much more towards resilience and flexibility. The ‘free flight’ concept (Hoekstra, 2001) would leave the resolution of conflicts between aircraft not to a central controller, as now, but to apparatus on board the individual aircraft. These would detect other aircraft with potentially conflicting trajectories and issue advisory flight path changes to the pilot. Hoekstra proposed a simple algorithm, which would ensure that the changes made by the two conflicting aircraft would always lessen the conflict and not cancel each other out. His simulation studies showed that this simple distributed system could cope effectively with traffic densities and conflict situations many times denser and more threatening than currently found in even the busiest airspace. Yet air traffic controllers, with their current central control philosophy, are already worried about their ability to cope with existing densities. Whilst there is still strong opposition to the free flight concept, such simulation studies do call into question the centralised control that both railways and aviation appear to be wedded to. See also Pariès’ discussion (Chapter 4) of the possibly inherently safe characteristics of distributed systems.

Track Maintenance Safety

In contrast to the philosophy of passenger (= process) safety, the tradeoffs for track maintenance seem to occur very differently. Here the maintenance teams have in the past been expected to ‘dodge between the trains’ to do their work. Only a minority of inspection and maintenance work is done with complete closure of the track to be maintained, let alone of the track running parallel to it a few metres away. The tracks are laid close together, particularly in the crowded Netherlands, in order to use up as little land as possible. A project manager for some new track being laid told us that he had requested permission to lay the parallel tracks further apart to allow physical barriers to be inserted between them when maintenance was being conducted on one track. This would avoid the need for evacuation of the maintenance area when trains passed on the adjacent line, because it would prevent the maintenance workers inadvertently straying onto, or being sucked onto the adjacent tracks and would reduce the risk of equipment in use on or around the one track breaching the safe envelope of the adjacent track. It would also reduce to some extent the need for very slow running of trains on the adjacent track. He was told that this proposal would only cost more money and that there was no decision-making or accounting mechanism in place to discount that extra cost against the savings from faster, safer and more effective maintenance in the later stage of the life cycle.

Controlled Access. There is, according to the Dutch railway rulebook, a regime of ‘controlled access’ to track under maintenance. This involves handing control of a section of track over to the maintenance team, who actively block the signal giving access to it and can make trains wait until they have cleared the track before admitting them to it. This can be done by various means, but the commonest is to place a specially designed ‘lance’ between the two rails to short circuit them and activate the upstream signal, which thinks there is a train in that block of track, until the lance is physically removed. Implementation of this regime would require some development of new technology and rules. Train controllers have resisted implementation of this regime. The conclusion we drew from our study was that their underlying reason for doing this is that it would undermine their central control of the network and would make their decisions about train running, especially in unplanned situations, dependent on others outside their direct control. Because this regime is hardly ever used, most maintenance work is done with lookouts warning the maintenance workers to clear the track at the approach of the train. Studies in different countries (e.g., Hale, 1989; Itoh et al., 2004) have shown that this system is subject to many opportunities for errors and violations, often in the name of getting the work done according to the tight maintenance schedules. Under this regime the casualties among workers are relatively high. Track capacity is bought here at the cost of safety.

Trading Risks. As a coda to this point it is interesting to relate that the Dutch labour inspectorate has begun in the last few years to insist on the closure of tracks and even adjacent tracks to protect maintenance workers (its constituents) and has come into conflict in that respect not only with the infrastructure manager (responsible for the track integrity, but also for allocating (selling) capacity to the train operators), but also with the railway inspectorate, which is concerned with passenger safety. From calculations made for the latter (Heijer & Hale, 2002), it is clear that, if the tracks were to be closed completely for maintenance and the trains cancelled, the increase of risk for the passengers by transferring them from rail to bus or car to cover the same journey would far outweigh the extra safety of the maintenance workers by closing the parallel line. This type of calculation was something new in the debate between the different parties, which had, up to then, argued purely from the point of view of the interests of their own constituents.

Assessing Resilience

The observations made above are by no means an exhaustive set of relevant observations about the resilience of the railway system. The data were not collected explicitly for that reason, which means there are many gaps in what we can say from them. It would be necessary at least to discover how risk controls are developed and used over time in order to be more certain about those aspects of how the system adapts itself to change. However, we can use these insights as a preliminary set of data and conclusions to suggest questions for further study and debate. What follows is, therefore, in many respects speculative.

Woods (2003) listed five characteristics as indicating lack of resilience in organisations. These were expanded on during the workshop. We have compiled the following list of topics from those two sources, on which to assess the railways, in so far as our data allows that.

•  Defences erode under production pressure.

•  Past good performance is taken as a reason for future confidence (complacency) about risk control.

•  Fragmented problem-solving clouds the big picture – mindfulness is not based on a shared risk picture.

•  There is a failure to revise risk assessments appropriately as new evidence accumulates.

•  Breakdown at boundaries impedes communication and coordination, which do not have sufficient richness and redundancy.

•  The organisation cannot respond flexibly to (rapidly) changing demands and is not able to cope with unexpected situations.

•  There is not a high enough ‘devotion’ to safety above or alongside other system goals.

•  Safety is not built as inherently as possible into the system and the way it operates, by default.

We might wish to question whether we need the term ‘resilience’ to define these characteristics, since many have already been discussed extensively in the literature on safety management and culture, which we have referenced above. However, we will take resilience as a working title for them. In the next sections we compare the information we collected from the railway industry with these characteristics and come to the conclusion that the industry scores relatively poorly. Hence we conclude that the current good passenger safety performance seems not to be achieved by resilience.

Defences Erode under Production Pressure

We have indicated above that we believe that the relatively poor record on track maintenance worker safety is related to the sacrifice of safety for the train timetable. There is strong resistance to the sort of track and line closures that would be needed to make a significant improvement in maintenance safety. There are also too few mechanisms for considering the needs of maintenance safety at the design stage of new track. In contrast, train and passenger safety shows the opposite trade-off at present, with delays in the timetable being accepted as the price of the stop-restart philosophy, which is needed to guarantee passenger safety. Currently the railways are resilient in respect of passenger safety according to this criterion, but not in respect of track maintenance worker safety.

It is important to ask whether this current resistance to production pressure in relation to passenger safety will be maintained. There has been intensive pressure the last five years on the Dutch railway system to improve its punctuality. Heavy fines have been imposed by government under performance contracts for falling below the targets set in relation to late running and outfall of trains. At the same time, the need for an extensive programme to recover from a long period of too limited track maintenance has been recognised and launched. This has resulted from and led to the discovery of many cracked rails and damaged track bedding. This requires very major increases in the maintenance program, to achieve its primary aim of sustaining and improving primary process safety. However, the capacity of railway contractors is limited and large infrastructure projects like the high speed line from Amsterdam to Brussels and the Betuwe Route (a dedicated new freight route from the Rotterdam harbours to Germany) occupy a large part of that capacity (and the available funds). Train controllers have been given targets linked more closely to these punctuality targets, as their task has been modified to give more emphasis to their function as allocators of network capacity and not just as guardians of its safety. This has led, according to our interviews, to increasing conflicts with train drivers reporting safety-related problems, which would require delays. All of the signs are therefore present for erosion of safety margins to take place. Our data cannot indicate whether or not such erosion is taking place. That would require a much more extensive analysis of subordinate indicators of safety, such as maintenance backlogs, violations of manning rules, or other signs of compromises between safety and the timetable being increasingly decided in favour of the timetable. The jury is therefore still out on this issue.

Past (Good) Performance is Taken as a Reason for Future Confidence – Complacency

There are less clear signs here from our interviews. These showed that the personnel are proud of their passenger safety record and talk about that aspect of performance to the relative exclusion of the other less favourable indicators, such as the track maintenance worker safety, or issues such as level crossing safety. However, there is also widespread concern about safety performance and about the likelihood that it will deteriorate in the face of the privatisation and decentralisation, which is far advanced. So we did not find any widespread complacency, certainly about passenger safety. Track worker safety and level crossing safety are also priority areas identified in the annual report of the Dutch infrastructure manager (Prorail, 2003).

What we do find in documents and workshops (Dijkstra & v. Poortvliet, 2004) is extensive argumentation that still more demands for investment in passenger safety for rail transport would be counterproductive. They would be inclined to increase the prices of rail tickets, restrict the number and punctuality of trains, and hence discourage people from making the modal transfer from the road to the rail systems. We do not, however, interpret this as complacency about the continued achievement of the high levels of passenger safety. It seems to fit more into an active concern with system boundaries, deriving from just the sorts of analysis of relative safety between modes, which we indicated in the section on track maintenance safety above. In general, the railways would appear to be resilient according to this criterion.

Fragmented Problem-Solving Clouds the Big Picture. No Shared Risk Picture

We consider a number of topics under this heading: the presence of an explicit risk model for the railway system, whether an integrated view is taken of all types of safety (passengers, track workers, drivers at level crossings, trespassers, etc.) and the issue of privatisation and decentralisation as threats to an integrated picture of risk control.

No Explicit Risk Model. As indicated in the preceding discussion of models of accident scenarios and safety measures, the Dutch railway system does not seem to have an explicit big picture defined in relation to its risk control task. The requirement of a system controller that it has a good model and overview of the system to be controlled is certainly not explicitly met. When we looked at the implicit picture its managers and workforce have, as seen through the scenarios we defined and the control measures we found in place to prevent, recognise and recover from, or mitigate the scenarios, we found that pretty well all those we defined during normal operation, maintenance and emergency situations were covered by rules or built-in barriers, but that those during the transitions between those phases were not. For example, a large number of the serious accidents to maintenance personnel occur in the phase of getting to or from their workplace, which is relatively poorly covered by procedures and safety measures. We also found the same sort of controversy raging about safety regimes, which we have described in the discussion of track maintenance safety. This represents a different view of how safety should be achieved, between the traffic controllers and the contractors required to do the maintenance.

An Integrated View of all Types of Safety? A further indicator of a lack of a clear and agreed risk picture is a split between the departments responsible for the different types of safety in the system. This split used to be far deeper in railways (Hale, 1989), with completely different departments in the old monopoly company dealing with train and passenger safety and with the working safety of staff. Within the infrastructure manager this split has now been considerably reduced. The safety departments have been fused. Safety indicators are brought together and presented in annual reports (Prorail, 2003) under one heading. There is also a move in the last few years to relate and tradeoff the different risk numbers quite explicitly, implying an overarching risk picture at the level of goals.

When we look at the translation of this increasing unity of goals into specific risk control measures, we see a different picture. The technical departments dealing with each type of risk are still relatively separate, as our example, in the section on track maintenance safety, of the lack of trade-off possible between track separation and maintenance costing shows. The consideration of different types of safety measures is also uneven and unintegrated. Relatively recent risk assessment studies of tunnel safety (Vernez et al., 1999) still show a relative overkill in the use of technical safety measures, with far too little integrated consideration of the concomitant organisational and human factors issues of clear responsibilities, ergonomic design, communication and (emergency) organisation. Our study of safety rule management (Larsen et al., 2004) showed that the railway system is still a technocratic one, with a mechanistic view of human behaviour and the role of rules in shaping that behaviour. There is little recognition of the need for online safety to be managed flexibly by those ‘at the sharp end’, which implies that rules and plans must be modifiable on-line to cater for local circumstances and situations. Instead rules are seen as carved in stone, whilst rule management systems often lack clear monitoring and modification steps, particularly those that would involve ‘actors’ now working in different organisations, but whose actions interact in achieving railway safety. We can characterise these safety measures as relatively rigid and lacking in resilience.

Decentralisation and Privatisation. With the privatisation and splitting up of the system which has taken place over the last few years, there is now extensive fragmentation of the system, both vertically in the trajectory from the regulator down through the railway undertakings to the contractors, and horizontally in the splitting of the old monopolies into separate infrastructure, capacity management and train operation companies. The old unity, which was created in the monopoly railway undertakings through the traditions of employment from father to son, through life-long employment and restricted use of external contracting, has now largely disappeared in the Netherlands. This unity resulted in a relatively common, if technocratic and rigid, picture of how safety was achieved. As we have indicated, that picture was largely implicit. With new train operating companies, maintenance contractors and sub-contractors now being employed, and with the increasing pressure for the opening of the national markets to foreign concessionaries, the basis for this common risk picture and practice is being rapidly eroded. National differences in safety practices are also being made apparent, as cross-border operations have increased and as the European drive towards inter-operability of railway systems has gathered momentum. Many of these are specific and different solutions to inherent and common problems, such as the form and content of signals and signs, the way in which rule violations are dealt with, or how control philosophies are translated into specific hardware requirements. The European TSIs (Technical Specifications for Interoperability), agreed at European level between representatives of national railways, are gradually reimposing a common risk picture, but this process will take a long time, as each country is required to change at least some of its current methods. Such a changeover period always produces the opportunity for misinterpretation and error.

Conclusion. The balance according to this criterion is that the old resilience, built into the system through its strong safety culture and nurtured by a stable workforce, is being rapidly eroded. Because the risk picture was never made explicit, it is particularly vulnerable to such cultural erosion. On the positive side, the railway industry has become increasingly open to the outside world over the last decade. This has meant that there is now much more attention to the need for a clear and explicit safety philosophy and safety management system. The SAMRAIL project is a manifestation of this. In that project there was explicit attention for the question of what the railway industry could learn from other high technology, high hazard industries, such as chemical process, nuclear and aviation.

Failure to Revise Risk Assessments as New Evidence Accumulates

Here our study does not offer any direct evidence. We have not penetrated deeply enough into the detailed workings of the system in our studies to follow the sort of risk assessment processes over time that would be necessary to draw conclusions on this point. The concerns about the still relatively watertight compartments within which decisions about risk are made, make us suspect that there is a certain rigidity in dealing with risks. Comparison of risks across domains is still a relatively new activity, meaning that priorities are not yet clearly open to risk-based thinking. However, the examples of normalisation of deviance analysed by Vaughan (1996) for Challenger and in the Columbia report (Gehman, 2003), which both used risk-based reasoning to justify risk acceptance, do not make us optimistic that such risk-based priorities will necessarily be safer ones. We already see risk comparisons being used for justifying not spending money on low risk problems (Dijkstra & v. Poortvliet, 2004) rather than for justifying using money on higher risks.

Breakdown at Boundaries Impeding Communication and Coordination

The break-up of the old rail monopolies mentioned in the discussion of fragmented problem solving above, has led to a significant decrease in formal communication and coordination. Our interviews showed that many existing formal communication and joint problem-solving meetings across what have become external organisational boundaries, rather than interdepartmental ones, had been discontinued; those which still existed had become poorly attended, with members complaining that their bosses had told them to stick to core business and not waste time on such meetings. Some channels had been diverted, e.g., a previous direct reporting channel to the train controllers for train drivers to report things that worried them during the trip (such as trespassers or objects on or near the line, rough track, anomalies in signals or level crossings) had been diverted to a ‘Helpdesk’, which seemed to function in the manner too often found with such ‘animals’ i.e., as a barrier and delaying mechanism, rather than as a problem-solving organ. We found that informal communication channels were still working, based on people who had previously worked together, and who still contacted each other to resolve these communication issues, even though they were now working for different organisations. This often occurred despite a lack of support from, and sometimes against the express instructions of their bosses. These channels are, of course, vulnerable to these individuals leaving their current jobs or retiring.

We did, however, find evidence of the development of new coordinating mechanisms, seemingly as a result of the falling away of the old ones. For example the large maintenance contractors, despite their competition with each other, had recently formed a group to write harmonised safety rules and procedures for their own employees and sub-contractors, since they no longer got those in an operational form from the infrastructure manager. This group was also starting to put pressure on the other players to restore the necessary communication to cope with on-line rule modification and to discuss the issue of regimes under which maintenance would take place (see the earlier discussions on track maintenance safety and on the fragmented risk picture).

If we add to these points the inherently difficult nature of communication and coordination in a highly distributed system, spread across the whole country, with staff constantly on the move, the verdict according to this criterion must be that railways are currently not very resilient and are threatening to become even less so.

The Organisation Cannot Respond Flexibly to Changing Demands and Unexpected Situations

We have discussed the issues relevant to this criterion in the section on operating philosophy above. The railway system scores very poorly on this point. Our thesis is that it can achieve passenger safety only through its ‘stop and restart’ philosophy, and fails to achieve track maintenance safety because it does not adopt the same philosophy there.

Another prerequisite for flexibility is that many people can do the same tasks and so take over from colleagues who are unable for any reason to perform them. Railways are generally poor at this. The reasons are partly structural – there is a lot of single manning, such as for train drivers, making substitution problematical without incurring very high duplication costs. This is demonstrated when a driver becomes incapacitated or emotionally stressed by, for example, a suicide jumping in front of the train.

Not a Great Enough Devotion to Safety

The conclusion from our study of safety rules (Larsen et al., 2004) was that there was still a very strong concern for passenger safety at all levels in the organisation, sustained by the still strong culture we have described above. Our thesis was that this enabled the railway system to retain its high passenger safety performance, despite the other shortcomings we have. Our interviews revealed a less fanatical devotion to maintenance safety, where accidents were regarded rather more as a fact of life. However, in recent years and under the growing influence of the Labour Inspectorate, increasing attention has been devoted to track worker safety (e.g., the Safe Working on Infrastructure project). This has resulted, amongst other things, in instructions to perform most maintenance during train-free periods and only stopping a train service if necessary.

Our concern for the devotion to passenger safety was one directed to the future. If the fragmentation and internationalisation of the railway industry waters down this strong culture, and if the increasing performance pressures we have identified discussed above erode the determination of senior managers to place safety first, we can see that this strong safety culture could disappear, perhaps even at a rapid pace. Reorganisations, of which the system has had many in many countries in the last few years, are quite capable of destroying a good safety culture in a relatively short time, if turnover is high and job security low.

Against this trend we can identify the increasing importance of passenger safety as a core marketing value for railways. In an on-going project for a high-speed line operator, we see this being placed centrally at the highest level in the management system goals and performance criteria. However, we can also note that the overall responsibility for safety in the reorganised railway industry has been a subject of considerable lack of clarity, at least in the Netherlands. The position of the rail regulator within the Ministry, and/or within the infrastructure company was long unclear, undermining the quality of the regulatory process (Hale et al., 2002). The shifting priorities of the traffic controllers, from pure safety to more concern for punctuality, also chips away at one requirement identified in resilient companies, the presence of powerful gatekeepers whose only responsibility is the achievement of high levels of safety.

The large infrastructure projects that are currently being carried out in the Netherlands (e.g., the high speed line to Paris and the Betuwe goods line from the Rotterdam port to Germany), on the other hand, do have a much higher level of thinking about safety, which would bring those new systems to a higher, more comprehensive, level. There is some evidence that those insights will also influence safety considerations in the old railway system. In time this may well result in considerably higher resilience on this point.

Safety not Built Inherently into the System

Our study has not penetrated into sufficient depth to conduct a thorough analysis of this criterion for the operation of railways. The long history of trial and error learning discussed by such books as that by Holt (1955) suggests that the traditional operations have built in this safety very well over time. The very performance in relation to current passenger safety in Europe is testament to this. The current phase of major technological and organisational development (high-speed crossborder lines, new technical train management and safety system concepts, dynamic train management, mixed concepts of heavy and light rail, etc.) will provide a new opportunity to increase inherent safety, but also a new opportunity to make wrong choices. Whereas change management, particularly of technological change, is an integral part of safety management in the chemical industry, in order to guide change to the positive side, the railway industry has still to introduce that aspect into its safety management. We do not, therefore, draw any conclusions in this area in respect of resilience.

In the area of track maintenance work, however, we do draw a negative conclusion. The examples of failure to implement inherently safer maintenance regimes and to take account of maintenance safety at the design and layout stage discussed earlier are clear indications that the railway system is not resilient in respect of this aspect.

Discussion and Conclusions

Railways, on this assessment, would seem to be examples of poor, or at best mixed, resilience, which can, however, still achieve high levels of safety, at least in certain areas of their operations. Table 9.2 summarises the verdict on the eight criteria discussed on the last section.

Table 9.2: Summary assessment of resilience of railways

Criterion

Conclusion (+ resilient, - not)

1. Defences erode with production pressure

+/−. Mixed evidence & gaps

2. Past performance leads to complacency

+

3. No shared risk picture

Was +, is now becoming -

4. Risk assessment not revised with new evidence

? No data from our study

5. Boundary breakdown impedes communication

− and getting worse

6. No flexible response to change/unexpected

7. Not high enough devotion to safety

+, but under threat

8. Safety not inherent enough in design of system

−, but gaps in our data

Passenger safety appears to be achieved by defining very clearly in advance what are the necessary prerequisites of safe operation and forbidding operation outside them. This is an extreme version of a command and control strategy, which has been defined by the protagonists of high reliability organisations (Weick, 1995) as the opposite pole to the flexible, mindful organisation called in this workshop the ‘resilient organisation’. When the system moves outside this clearly defined safe envelope, the railway system stops, regroups and restarts only when the necessary operating conditions have been reestablished. Hence safety is achieved by sacrificing goals, traffic volume and punctuality. The system does not achieve all its goals simultaneously and flexibly and is not resilient. Yet it achieves a very high safety performance on this performance measure. We have argued that this level of performance may be under threat, because we see the current level of resilience tending to be reduced rather than increased.

On track maintenance worker safety, the system has a poorer performance. Here we can conclude that the lack of resilience on the criteria mentioned provide a reasonable summary of the reasons for this safety performance failure. The trade-off works here in the other direction. Punctuality of the trains and traffic volume are bought at the price of safety. The system is not resilient and performs relatively poorly. We fear, here also, a decline in performance as the resilience is reduced even further.

If our analysis is accepted, we have shown that safety performance and resilience are, at least here, relatively independent of each other. This complements the questions raised in our brief intervention (Chapter 3), in which we argued that road traffic could be called resilient in some senses, but performs in absolute terms (accidents per passenger kilometre) poorly. We said there that the performance was, however, quite good considering the exposure to risk.

We would appear to have, with railways, a system that is ultra-safe on one safety dimension, but not a good performer on several others. We could argue further along two lines:

•  If railways were to work on improving their resilience, they would be able to become much better safety performers on other safety dimensions, without sacrificing other system goals. The large new projects like HSL and Betuwe route provide an opportunity for introduction of such improvements. We would then need to study if and how more resilience could be built in the conventional system and what that would cost.

•  Resilience is only one strategy for achieving very high levels of safety. There are others, but these have other costs. We would then need to define when resilience is a suitable strategy for a company or system and when other strategies are ‘better’.

We prefer to leave both of those lines of development open for further discussion, research and analysis.

Systems Are Never Perfect

Yushi Fujita

Engineers may well accomplish self-consistent design, but its outcome, the artifact, can never be perfect in operation. Neither humans working at the front end (e.g., operators, maintenance persons), nor humans working at the back end (e.g., administrators, regulators) are perfect. The system (i.e., the combination of artifact and humans with various responsibilities) therefore cannot be perfect. It is not only the variability of human performances but also the human propensity for pursuing perfection with prescriptive rules while ceaselessly trying to change the system for better, that makes the system incomplete and imperfect as the time passes by.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset