Socio-technically based risk assessment and management

Andrew Hale, Barry Kirwan, 19 Sep 2014

Dealing with personal risks is often straightforward, even instinctive. If a fire breaks out you move away from it; if a car swerves towards your car, you take avoiding action or brake or both. But managing risk in large organisations can be far more complex; risks are not always obvious and can manifest themselves in many ways, some of them unexpected.

Know your risks

It is the job of the safety director or safety manager to oversee the assessment of risks in complex organisations. The audience for that assessment is in the first instance the CEO and the executive board of the organisation, who need to understand the risks being managed, and broadly how they are being managed. They can then take the decisions necessary to set up and operate an effective safety management system which copes with the uncertainties and complexities.

Risks vary according to the organisation’s function and processes. In a steel mill risk man­agement may focus on avoidance of burns and a lot worse, eg falling into an open furnace. For a transportation system, risk management will often focus on collisions of any type. For a nuclear power plant, there are many risks but the primary one is the loss of contain­ment resulting in release of radioactivity into the atmosphere. But all of these organisa­tions also have many other types of risk, which all need managing, or they may catch you out with a fatality or serious injury. As Amalberti and Vincent say in their accompanioning essay, the model of safety may differ between these industries. In what follows we address mainly their ’ultra-safe systems model’, but incorporate aspects of their ’high reliability or­ganizations’ model. Risk assessment is also important in their ’ultra-resilient model’, but is likely there not to be done as explicitly as in the other two models and to be concentrat­ed on novices absorbing knowledge about the risks of the occupation at the feet of experi­enced operators.

If an organisation is to manage its risks competently there must be a common understand­ing in the organisation of what risks it faces, and what actions and activities it can deploy to keep those risks under control. This may not be easy because different professions value and handle complex quantitative and qualitative information differently and therefore also interpret risk assessments differently. An organisation should have a comprehensive risk picture that encompasses all its safety risks, including how they can interact to make a bad situation worse. In risk assessments, safety and threat to survival of a company have to be combined. In this way the organisation can be prepared to face its daily challenges.

Hazards and risks

The general definition of a hazard is a situation that poses a threat to life, health, property or the environment. Most hazards in this general sense are considered ’dormant’, unless they become ’active’ in which case their threat is realised. Generic examples are kinetic en­ergy from moving or flying objects, potential energy from working at heights where a fall of a person or object converts the potential into kinetic injury before impact, etc. The formal definition of hazard when carrying out risk assessment and management is more precise. A hazard is any biological, chemical, mechanical, environmental or physical agent that is reasonably likely to cause harm to humans or damage to the system or the organisation, in the absence of control. This means that the organisation must determine its hazards and establish controls for them.

The risk associated with the hazard is then the probability of loss of control of the hazard, combined with the resultant consequences. This definition of risk encompasses a wide range of possible consequences, ranging across injury, disease, theft, physical damage, pro­duction loss, poor product quality and safety, environmental pollution, business interrup­tion, bankruptcy, cybercrime, etc.

A company needs to decide which of these risks it needs to manage and how. If the organ­isation is regulated for safety, then some or all of these risks must be managed by law in order to maintain an operating license. Ultimately, at the level of the Board and the CEO all risks have to be managed and trade-offs decided where necessary between them and their controls. However at the level of line and staff departments and advisers, the respon­sibility for ensuring that the organisation controls its different risks may be allocated to different parts of the organisation. In some companies there is, for example, a separate quality manager, health and safety manager and environmental manager, whilst in others all of these areas may fall under one manager or director. The CEO and Board must decide how this allocation of responsibility will be made for their organisation. This can be based on the similarities and differences in the causes or controls for different types of risks (process and workplace), or according to organisational levels. Ultimately these different ’risk managers’ can only be advisers to and monitors of the line management and Board, where the ultimate responsibility lies.

How safe do you need to be?

Risk is first and foremost concerned with an unwanted event, usually considered as an ’accident’, eg an air crash, a nuclear meltdown, a factory fire, a ship sinking, a train derail­ment, a man falling to his death from a ladder or a gantry, or being struck by a vehicle. Many industries have formal risk targets imposed and monitored by a regulatory authori­ty. For example, a nuclear power plant core meltdown should have a probability of less than once in a million years that a nuclear power plant might be operating. This sounds comforting, until a quick analysis based on say, three hundred reactors worldwide and fifty operating years, means that we should have seen no meltdowns, but in fact there have been three (Three Mile Island, Chernobyl, and Fukushima). For commercial aviation (carrying passengers and freight), air crashes generally occur at a rate of one in ten million flights. This also sounds good, but because we fly a lot the number of fatal crashes annually worldwide is typically in double figures.

The consequences part of the risk concept used to concern just the number of fatalities, eg one person killed, several people killed, up to hundreds killed. If an organisation kills many people it is unlikely to survive. However, today this has become more complex. Damage to the reputation of a company may be a consequence for an organisation, eg where there is loss of confidence in the company resulting in loss of revenue and eventual bankruptcy. This element of risk is of serious concern for many organisations, especially given the rising influence of social media networks such as Facebook and Twitter.

How do you know your ’level’ of risk?

There are formal methods for calculating safety risk. They provide answers to the following questions that the CEO of an organization should be asking:

• What are my hazards?

• What can go wrong with the controls in place?

• How likely is it to go wrong?

• Can I see these risks all together, qualitatively or quantitatively?

• What is my overall risk, and what are my top risks?

• How sure am I of the answers?

• Are we meeting the safety target?

• What can we do to reduce risks?

Below is an example of one approach, called a fault tree (Figure 1). This particular extract comes from the field of aviation, looking at the risk of a mid-air collision; such events are very rare, thanks to multiple independent safety systems in the air and on the ground, in terms of both automation and pilots and controllers. The probability (Q) figures are derived from experience or databases, feeding up from the base events or failures (the cir­cular icons), through either ’and-gates’ (icons with straight undersides) or ’or-gates’ (icons with concave undersides) to the top event with its thus-calculated probability.

 

 

Figure 1: Example of a fault tree

Tools such as fault trees help with the difficult part of ‚determining what controls are or should be in place and how they might fail’ and of ’putting the risks together’ so that an overall risk picture can be gained, and total risk calculated. Typically the events in the fault trees lead up to the eventual loss of control of the hazard, and another ’tree’, called an ’Event Tree’ determines the range of consequences that are likely to occur. A generic illus­tration of this, called the ’bow tie diagram’ is given below (Figure 2), with ’business upset’ as the central event, which in safety terms usually equates to loss of control of a hazard.

Figure 2: Example of a bow tie diagram


Manage your risk controls

The output of risk assessments consists of a description of risk controls which will have to be implemented to prevent and mitigate the unwanted consequences of the risk scenarios. Risk control mostly happens via barriers that keep the hazard under control or mitigate its consequences once control is lost. Barriers may be physical (machinery guards, edge pro­tection on roofs, chemical bunds around storage tanks, ear defenders, safety goggles, pres­sure relief valves, sprinkler systems, fire extinguishers, etc.) or behavioural (skilled fire fighters, skill with a boning knife in an abattoir, keeping away from moving machinery, evacuation before an encroaching fire, etc.), or a combination of both (competent drivers of vehicles, activating a fire alarm, diagnosing an equipment failure and taking remedial ac­tion, etc.). Preventive barriers stop the scenario before the loss of control; mitigating barri­ers intervene afterwards to lessen the seriousness of the consequences. Risk management entails keeping the barriers on either side of the loss of control event effective, and looking for ways they could fail. In many activities there needs to be ’defence in depth’, with many barriers controlling a given serious risk, so that, if one fails there are others still working.

The concept of barriers is crucial also for one of the most influential safety models of the past three decades, the so-called ’Swiss Cheese’ model promoted by James Reason, as illus­trated in Figure 3 below. The barriers are likened to blocks of Swiss cheese which have holes in them, meaning there are gaps which have appeared in the system’s defences; the barriers are not working effectively. If the holes ’line up’, meaning that a number of de­fects or deviations come together in an unpredicted way, an accident occurs. The successive layers may be technical barriers, eg in aviation there is a system aboard most aircrafts to detect another aircraft on an intercept course; or they can be organisational, eg the train­ing and selection processes that deliver a safe and competent train driver. The model has been particularly useful in demonstrating the systemic nature of accidents.

Successful management of risk controls, once they have been decided on, consists of man­aging the life cycle of those technical and behavioral elements making up the risk control; making sure that they are developed, fit for purpose, installed, used and maintained. In the descriptions below we start from the point where the required risk control and its hardware, software and behavioural elements have already been specified to the best of the ability of the organisation. However even the best risk analysis can never anticipate and control all future risks. We are not prescient or inventive enough to guarantee that. Hence management of the risk controls must include monitoring and improvement to respond to learning opportunities when new risks or new ways for the risk controls to fail are dis­covered.

Technical elements

  • Purchase/construct. Is a suitable high quality hardware/software risk control available on the market, or can it better be fabricated in-company? In the first instance the procurement function needs to specify the requirements and find suitable suppliers meeting the design specification and dependability requirements. In the latter instance the construction function does that work and may need to call in adequate expertise on dependability.
  • Install/commission. This work may be done by a company department or sub-contract­ed to a supplier. It requires competence, coordination and monitoring.
  • Use. This is the link to the management of the behavioural elements of the risk control.
  • Inspect. The functioning of the hardware requires monitoring either continuously, or at planned intervals.
  • Maintain. If inspection shows deterioration of its functionality the hardware needs maintaining.
  • Monitor/modify/improve. If the hardware fails to live up to its planned performance, or fails to control all risk scenarios experienced it may need modifying or replacing.

Behavioural elements

  • Specify procedures. The behaviour in using the hardware/software controls, and the additional required behaviour forming part of the risk controls themselves need to be captured in procedures which can be communicated to the users. If they are procedures for using technology, the technology and the procedures need to be designed together to maximise their usability.
  • Select/train: manage competence. Suitably qualified people need to be recruited and trained in the procedures until they test as competent. For unexpected risk scenarios, which are not able to be captured in procedures, a more general competence to impro­vise may need to be trained.
  • Provide manpower & communication. Being competent is not enough to ensure that the right behaviour is shown. The organisation must plan its manpower to be available when that behaviour is needed and must make sure that different individuals commu­nicate and collaborate when the risk control depends on more than one person working in unison, such as at shift handovers or where risk depends on control room staff and field operators (maintenance engineers, pilots, train drivers, etc.) collaborating effectively.
  • Motivate. The organisation must also go beyond just ensuring competence. It must motivate people to choose the correct risk control behaviour over conflicting behaviour aimed at production, quality, individual effort/comfort, etc., which may seem more attractive at the time.
  • Maintain. As with hardware, behaviour may degrade or deviate over time, requiring inspection and feedback (behavioural safety), refresher training and behavioural campaigns.
  • Modify/improve. If the behaviour fails to live up to its planned performance, or fails to control all risk scenarios experienced it may need modifying or replacing.

Organizing safety

The detailed management of the performance of the risk controls which emerge from risk assessment has been dealt with above. The overall safety management system of an organi­sation needs to provide and coordinate those management processes. At a higher level of abstraction safety management consists of organising two interlocking management cycles (Figure 3):

• The first is the operational cycle (red arrows) that carries out the risk assessment, decides on risk controls, implements and monitors them and feeds back to modify the risk as­sessment and control.

• The second is the policy cycle (blue arrows) which sets the strategy, provides resources and allocates responsibilities to run that risk assessment and control cycle, monitors how it works and proposes and manages changes to achieve continuous improvement.

Figure 3: Two cycles for organizing safety

The second cycle needs to be driven in detail by the CEO and the Board of Directors be­cause CEO buy-in into safety is key to risk-related decision-making in the organization. Moreover, risk and safety are always about trade-offs which need to be endorsed by the CEO and for which the CEO has to be held accountable. The first cycle is more the realm of the safety manager, by which board decisions are fed and supported. The setting up and monitoring of that cycle can be delegated to the safety manager’s competence to col­laborate with the line and staff in its implementation. The Board and CEO then monitor that first cycle and ensure it keeps on taking place. The CEO and Board also need to as­sess and review the competence of the safety manager to fulfil this role.

Board level management of safety

The main role of the CEO with respect to safety is in asking questions, for example:

• What are our top safety risk concerns?

• Are they increasing/decreasing/stable?

• What do the latest quarterly safety trend analyses show?

• Are we as safe, or safer, than our competitors?

These questions send the message that safety is important to the other board members. This does not need to be ’heavy-handed’, but should simply reflect a genuine concern and understanding that safety is key to business health. However, this does need to be authen­tic; otherwise it is known as ’lip service’, which people see as being insincere.

The CEO should also appoint a director who is the safety ’champion’ (the safety manager will report directly to this person). This Safety Director, may also be the Director of Qual­ity, and/or Environment, and/or Security. There remains debate about whether such a di­rector should have only safety as his or her responsibility: sole responsibility allows clear focus and fewer conflicts, whereas joint responsibilities may enable better integration of safety into business models and decisions. The CEO should be able to challenge the Safety Director (eg over facts, figures and assumptions) to avoid other directors feeling that the Safety Director can ’play the safety card’ all the time. Similarly, if the Safety Director is going to raise something significant at board level, the CEO should be able to expect that the Safety Director has ’done his/her homework’, both in ensuring that there is enough ev­idence and/or concern to warrant raising the issue, and in talking with other relevant Di­rectors, so that they are not surprised at the meeting. The second aspect means that the CEO should pick someone with a degree of ’political acumen’. As some CEOs have put it, they do not want a ’safety clerk’ at board level. When safety issues are raised at board level and determined to be important, the CEO should galvanise the board to explore them and put in place actions (not only on the Safe­ty Director, but engaging other directors as appropriate), and demand progress on those actions at each successive Board meeting until the potential safety ’threat’ has been re­duced or eliminated.

The CEO should also speak to line and staff managers about safety, and what is being done about safety. This can be done directly, or indirectly, whether by email, or more commonly by ’Blog’, or video, etc. These CEO-staff communications, aside from represent­ing good safety culture, also affect the other board members. Other directors will be more likely to ’follow suit’ and talk more openly with their subordinate managers and staff about safety. In fact this will be to some degree expected, as otherwise it looks odd to staff that the organisation’s leader appears to care about safety but the directors do not. Put simply, this requires leadership by example.

One crucial prerequisite for board action on safety issues is adequate information. Beside the informal communication channels just mentioned, which are very important as the people „at the sharp end“ often have an acute insight into safety, there are two main sources of safety information. The first will be events that have happened – incidents or accidents. For these there will be formal procedures and probably associated reporting re­quirements to a regulatory body, as well as analysis mechanisms looking for causes and contributory factors. Such systems offer relatively reliable indicators (called ’lagging indi­cators, since they relate to the past and are ’forensic’ in nature) on how safety is doing. Sec­ondly, there will be sources of safety information from safety cases, safety audits and sur­veys (including safety culture surveys), evaluations of Safety Management Systems, and steps to achieve safety certification levels associated with systems or projects. Such infor­mation can be seen as indicating present and future performance, and so indicators derived from such examinations are often called ’leading indicators’.

Safety Dashboards are an integration of key safety metrics that can be used by the CEO and Board to gain a quick overview of the safety health of their organisation and its opera­tions. They contain both lagging and leading indications (ie data from past events, and in­formation on current and future performance) of safety from metrics relevant to the organ­isation’s operational and regulatory context. Sometimes they also contain indications of business volume or operational ’level’ (whether it is up or down) since such degrees of ’out­put’ are often (but not always) correlated with safety risk (eg in air traffic, if the amount of traffic rises, generally so do the safety risks). The dashboard should contain what needs to be there, and not just what is easy to measure and has ready-made statistics. Similarly it is important not to get into a pure ’target-chasing’ approach by focusing solely on the num­bers in an effort to increase or decrease certain statistics – this is because such an approach may lead to suppression of true statistics or reducing risks in one area by ’exporting’ it to another. The Dashboard is a tool to be used in understanding and improving safety, and only represents the ’top tier’ of safety information, much of which will be qualitative rath­er than quantitative.

One great worry: ’Drift into danger’

Drift into danger is like the proverbial frog who fails to jump out of a pan full of water that is brought to the boil slowly. A clear example of drift into danger is when, under eco­nomic pressure, resources for safety are slowly eroded, including operational safety person­nel (through staff cutbacks or non-replaced staff losses) and equipment, as well as resources to carry out safety work. The problem is that, initially, nothing goes wrong, and all seems well. People adjust so that the new less-resourced system becomes the norm. Staff begin to adapt procedures and do things differently – they get the job done, but there are perhaps less safeguards. And then one day an accident happens, and it probably also escalates be­cause there is less equipment and trained staff to deal with it.

It is not easy to detect that drift into danger is happening. It requires that the Board and Safety Manager look differently at safety indicators, with a longer term view, alert to trends, an attitude of ’creative mistrust’, questioning in particular when all the indicators appear to be indicating that all is well. If there are good safety indicators these may detect it, although this may not happen if the system has ’normalised’ around its new parameters of operation. Older operational staff are more likely to perceive the drift than younger and newer staff, and independent and external observers may also see the risk more clearly as they are ’outsiders’. Safety culture surveys can often pick up signals from comments or during workshops, and observational surveys or audits may similarly raise questions about whether safety margins have eroded or not.

Further reading

Andrews, J.D. & Moss, T.R. (2002). Reliability and risk assessment (2nd ed.) 2002. London: MechE Professional Engineering Publishing.

Haddon, W. (1973). Energy damage and 10 countermeasure strategies. Human Factors, 15, 355-366.

Hale, A.R., Ale, B.J.M., Goossens, L.H.J., Heijer, T., Bellamy, L.J, Mud, M.L., Roelen, A., Baksteen, H., Post, J., Papazoglou, I.A., Bloemhoff, A. & Oh, J.I.H. (2007). Modeling accidents for prioritizing prevention. Reliability Engineering & System Safety, 92, 1701-1715.

Hale, A.R., Goossens, L.H.J., Ale, B.J.M., Bellamy, L.A., Post, J., Oh, J.I.H. &Papazoglou, I. A. (2004). Managing safety barriers and controls at the workplace.

In C. Spitzer, U. Schmocker & V.N. Dang (eds.), Probabilistic Safety Assessment & Management (pp. 608 – 613). Berlin: Springer.

Health and Safety Executive (2011). Five steps to risk assessment. http://www.hse.gov.uk/pubns/indg163.pdf

Maguire, R. (2006). Safety cases and safety reports. Aldershot, UK: Ashgate. Reason, J.T. (1997). Managing the risks of organisational accidents. Aldershot, UK: Ashgate.

Safety Intelligence for CEOs – A White Paper. http://www.eurocontrol.int/sites/default/ files/content/documents/nm/safety/safety_intelligence_white_paper_2013.pdf

Safe Work Australia (2012). Guide for major hazard facilities - safety assessment. http://www.safeworkaustralia.gov.au/sites/SWA/about/Publications/Documents/669/Safety%20Assessment.pdf

 

Download full article (PDF 437 KB)

Authors

Andrew Hale

Andrew Hale was Professor of Safety Science at the Delft University of Technology in the Netherlands, full time from 1984-2007 and part-time 2007 until his retirement in 2009. He is currently chairman of the consultancy HASTAM in UK. He has worked in the area of safety and health since 1966, initially on accident investigation and human behaviour in occupational safety, and later on professionalisation of the field of safety, on safety management and regulation, not only in occupational, but also transport safety. He has broad experience as a member of policy, advisory and evaluation committees for safety research institutions in several European countries and as chairman and member of a number of government advisory committees in the Netherlands. He was chief editor of Safety Science from 1993 until the end of 2008 and has been on the board of some half dozen other scientific journals. He has examined PhDs in UK, Netherlands, Norway, Sweden, Denmark, France and Australia. He was until recently chair of the Certification Committee of the European Network of Occupational Safety and Health Professional Organisations. He was honoured with a knighthood (Ridder) in the Order van de Dutch Lion for his services to safety in the Netherlands in 2006

Barry Kirwan

Barry Kirwan has degrees in Psychology, Human Factors and Human Reliability Assessment. He has worked in the nuclear, chemical, petrochemical, marine and air traffic sectors of industry, and lectured at the University of Birmingham in Human Factors. He was formerly Head of Human Reliability at BNFL in the UK nuclear industry, and Head of Human Factors at National Air Traffic Services (UK). For the past fourteen years he has been working for EUROCONTROL, managing a team of safety researchers and safety culture specialists at the EUROCONTROL Experimental Centre in Bretigny, near Paris. He has published four books and around 230 articles. He is also a visiting Professor of Human Reliability & Safety at Nottingham University in the UK. He currently leads the European Safety Culture Programme for Air Traffic Management, dealing with more than thirty countries in Europe, as well as collaborating with the Federal Aviation Authority in the US, and advises UK rail and nuclear power industries on Human Reliability Assessment.

Related articles

Teamwork and learning: Two fundamental processes for safety

Gudela Grote, John S. Carroll , 19 Sep 2014

Safe operations in organisations require both a formal managerial system and informal practices that enact and support the system. The formal aspect consists of a set of policies, procedures and practices often summarised under the heading of safety management systems.

Reducing Healthcare Costs by Investing in Safety: Safety Management Examples from the U.S.

Michaela Kolbe, 19 Sep 2014

How can high quality patient care be maintained in times of increasing production pressure within the health sector? In this paper, I will take an organisational psychologist’s perspective and discuss the risks of focusing heavily on cost reduction and production pressure in healthcare; and review measures for maintaining high quality and safety of patient care while reducing costs.

A continuum of safety models

Charles Vincent , Réné Amalberti, 19 Sep 2014

The idea of a single model of safety that applies to everything and aims to have zero accidents is naïve. There are many different responses to risk, which provoke many different authentic models of safety, each with their own approach, advantages and limitations. The differences between these models lie in the trade-offs between the benefits of adaptability and the benefits of the level of safety. Ultimately safety is a social construct and it adapts to demand.

The culture factor in safety culture

Ed Schein, 19 Sep 2014

Safety culture as a concept has suffered the same fate as culture itself. Theoreticians, safety professionals, members of different occupations in different industries have chosen to define it in terms of their particular goals and have produced, therefore, a lot of confusion about what safety culture is and whether it can usefully be generalized to help understand safety problems in different industries and cultures.