Tag: Incidents

Should we perform an Incident Investigation for Hail Damage?

The Issue: Recently an industry friend reached out with a question that I thought was worth sharing. They recently had some fierce storms roll through their area that involved tennis-ball sized hail. This hail caused some insulation damage, but didn’t cause any ammonia release. Here are some pictures of the type of damage they experienced.

Hail Damage pictures

Hail Damage pictures

The question is “Would this require an Incident Investigation?”

 

The Law: As always, first we look at the law.

OSHA 29CFR1910.119(m)(1): The employer shall investigate each incident which resulted in, or could reasonably have resulted in a catastrophic release of highly hazardous chemical in the workplace.

EPA 40CFR68.81: The owner or operator shall investigate each incident which resulted in, or could reasonably have resulted in a catastrophic release.

While there is obvious damage to the protective jacketing and vapor barrier, you could make a defensible argument that this is not something that could “reasonably have resulted in a catastrophic release of highly hazardous chemical.” That’s not to say there isn’t any value to such an investigation, but that there most likely is not a requirement to investigate this incident based solely on the PSM/RMP rules. But, the rules aren’t the only guidance available to us, so let’s look further.

 

RAGAGEP and Written Programs: In my opinion, the best RAGAGEP available on the topic is the CCPS book Guidelines for Investigating Chemical Process Incidents, 2nd Edition, which is what inspired the approach we take in our Incident Investigation element Written Plan. Similarly, the IIAR’s publication PSM & RMP Guidelines makes roughly the same types of arguments and include an EPA suggestion that any damage of $50,000 or more should be investigated. If you’ve priced insulation recently, you know we’re likely to hit that threshold.

Here’s the relevant part of our Incident Investigation element Written Plan which incorporates the CCPS guidance:

An Incident is an unusual or unexpected occurrence, which either resulted in, or had the potential to result in:

  • Serious injury to personnel
  • Significant damage to property
  • Adverse environmental impacts
  • A major disruption of process operations

That definition implies three types or levels of incidents:

Accident – An occurrence where property damage, material loss, detrimental environmental impact or human injury occurs. (off-site Ammonia release, product in freezer exposed to ammonia, personnel injury, etc.)

Near Miss – An occurrence when an accident could have happened if the circumstances were slightly different. We sometimes call these incidents “An Accident where something went right”. (Forklift strikes an air unit causing only cosmetic damage and no Ammonia is released, an activation of an automatic shutdown, etc.)

Process Upset / Interruption – An occurrence where the process was interrupted. (Vessel high-level alarm, a nuisance ammonia odor report, ice buildup on an air unit preventing it from cooling properly, failing to conduct required PSM activities as scheduled, etc. Many Process Interruptions are fixed before the event leads to a shutdown. If the equipment was shut down manually or automatically in response to an unexpected occurrence, then the incident is to be investigated as a Near Miss.

This storm damage would seem to trigger the “Significant damage to property” part of the Incident definition and classify it as an Accident due to “property damage.” In accordance with the relevant RAGAGEP and our element Written Plan, we’d expect you to conduct an Incident Investigation despitedefensible argument that the PSM/RMP rules do not require one.

 

What we accomplish with an Incident Investigation: With a formal assessment of the incident, we’re hoping to document the following:

  1. The safeguards in place were adequate such that an ammonia release did not occur;
  2. The damage was investigated and found to be largely cosmetic with no significant effect on integrity, and limited effect on efficiency;
  3. Provide a documented recommendation to address the damage, both in the long term (replacement/repair) and short-term (sealing up any vapor barrier tears with caulking for example);
  4. Provide a method of tracking the identified corrective actions to closure.

Conclusion: While it seems pretty clear the PSM/RMP rules themselves wouldn’t require an Incident Investigation, RAGAGEP would and there’s much to be gained from one.

Powered Industrial Trucks in Machine Rooms

Powered Industrial Trucks (PIT) in Machine Rooms are a known struck-by hazard.  What most people don’t realize is how serious the results of a PIT impact in a Machinery Room can be.

For example, a forklift / scissor lift impact that shears a 3″ TSS (ThermoSyphon Supply) or HPL (High Pressure Liquid) operating at a typical head pressure of 160PSIG results in a release rate of over 18,500 pounds per minute.

Many facilities attempt to establish a ban on PIT in their machinery rooms, but while the needs for PIT in machine rooms are very limited, there are situations where they are necessary. An outright ban won’t likely survive prolonged contact with reality.

To address this issue in a PHA, we usually recommend a Written Machine Room PIT policy as an administrative control. For years we’ve discussed the content of that policy informally with people. Recently a PSM coordinator shared her written policy & permit with us and after some alterations and formatting, we’re adding it to the SOP Templates section.

Front of the Permit:

Back of the Permit with additional explanations:

 

As always, you can find this on the Google Shared template drive.

CSB’s NEW Chemical Incident Reporting Rule is FINAL

“U.S. Chemical Safety Board and Hazard Investigation Board (CSB) has approved a final rule on accidental release reporting. The CSB has posted a prepublication version of the final rule… The official version should be published early next week in the Federal Register.

The rule requires prompt reports to the CSB from owners or operators of facilities that experience an accidental release of a regulated substance or extremely hazardous that results in a death, serious injury or substantial property damage. The CSB anticipates that these reports will provide the agency with key information important to the CSB in making prompt deployment decisions…

The rule is required by the CSB’s enabling legislation but was not issued during the first 20 years of CSB operations. Last year, a court ordered the CSB to finalize a rule within a year. “

What it means: If the incident resulted in Death, Serious Injury or Substantial Property Damage ($1kk or more) then you have to report the incident to the CSB (via phone 202- 261-7600 or email report@csb.gov) within 30 minutes. The report must include:

1604.4 Information required in an accidental release report submitted to the CSB
1604.4 The report required under §1604.3(c) must include the following information regarding an accidental release as applicable:
1604.4(a) The name of, and contact information for, the owner/operator;
1604.4(b) The name of, and contact information for, the person making the report;
1604.4(c) The location information and facility identifier;
1604.4(d) The approximate time of the accidental release;
1604.4(e) A brief description of the accidental release;
1604.4(f) An indication whether one or more of the following has occurred: (1) fire; (2) explosion; (3) death; (4) serious injury; or (5) property damage.
1604.4(g) The name of the material(s) involved in the accidental release, the Chemical Abstract Service (CAS) number(s), or other appropriate identifiers;
1604.4(h) If known, the amount of the release;
1604.4(i) If known, the number of fatalities;
1604.4(j) If known, the number of serious injuries;
1604.4(k) Estimated property damage at or outside the stationary source;
1604.4(l) Whether the accidental release has resulted in an evacuation order impacting members of the general public and others, and, if known:
1604.4(l)(1) the number of persons evacuated;
1604.4(l)(2) approximate radius of the evacuation zone;
1604.4(l)(3) the type of person subject to the evacuation order (i.e., employees, members of the general public, or both).

The good news is that if you have to report the incident to the NRC then you can skip reporting all the above data and simply report the NRC case number you’re given during the NRC call.

This new requirement takes effect 30 days from the posting in the Federal Register so ACT NOW. It’s important that you update your program because there are enforcement penalties associated with not following this new rule…

1604.5(b) Violation of this part is subject to enforcement pursuant to the authorities of 42 U.S.C. 7413 and 42 U.S.C. 7414, which may include
1604.5(b)(1) Administrative penalties;
1604.5(b)(2) Civil action; or
1604.5(b)(3) Criminal action.

 

What should I do? 

If you use the template program, the hard work has already been done FOR YOU. Just open up the template directory on Google Drive and follow these steps for your program:

  • In \Reference\ add new directory \Reference\CSB\ and place “CSB Reporting Accidental Releases – prepublicationcopy 020320.pdf” in it. You can get it from the templates directory or from the EPA link.
  • In \Reference\CFR\ add “40CFR1604 – Hazardous substances Reporting and recordkeeping requirements.doc” from the templates directory.
  • Update the Incident Investigation element Written Plan to the 020720 version from the templates directory.
  • Update the \01 – EPA RMP\ Definitions file to the 020720 version from the templates directory.
  • Train all Responsible Persons and affected management on the new policies.
  • Document the changes in your DOC-Cert in accordance with the Implementation Policy: Managing Procedure / Document Changes found in the MOC/PSSR element Written Plan.

Note: If you have instructions for Agency Notifications somewhere outside your Incident Investigation plan, you’ll need to update them to include the CSB contact information there too. Feel free to use the text in the Incident Investigation element Written Plan, Implementation Policy: Agency Notifications.

 

A little help can go a long way!

Sometimes a little extra can go a long way to improve the effectiveness of your compliance efforts. I would like to show you how we used two simple, inexpensive laminated cards to improve the effectiveness of our APR inspections and Incident reporting / reactions.

APR Card

First, the APR issue:1910.134 has some requirements on inspections, cleaning, fit-check, etc. We require our service technicians to wear APR’s during Line-Opening. I created a small laminated card (about 5″x8″) that fits in their APR bag. With the included permanent marker, we can track the APR inspections for a year. The card also provides convenient information on the “Fit-Check” and “Monthly Inspection” procedures. Here’s the WORD document if you want to modify it for your use.

 

Leak Investigation / Incident Reporting

Our technicians are often called to look into reported ammonia odors. We’ve established a policy on doing this in compliance with 1910.119(n) concerning “handling small releases.” We also conduct Incident Investigations to meet the requirements of 1910.119(m). Again, I created a small laminated card (about 5″x8″) that fits in their APR bag.  It provides a quick-reference to the investigation procedure, as well as reminders of the information we’ll be asking them for. Contact numbers for company safety/compliance resources are also included. Here’s the WORD document if you want to modify it for your use.

 

Little items like this can reinforce your training. The easier “being compliant” is, the more likely it is to happen in the field! 

p.s. The Word documents are meant to be printed double-sided. I use 32# paper, trim, then seal with 5mil clear laminating envelopes. 

Why use the “buddy system” during Line Openings?

Most LEO (Line & Equipment Opening) policy a.k.a. “Line Break” policies require a second person away from the work but in the immediate area. It is reasonable to ask why the procedure demands this.

Put as simply as possible:

  1. PSM/RMP and IIAR 7 require procedures for Line & Equipment Openings. (or IIAR 7 alone if you have under 10k pounds)
  2. The PHA asks questions that identify hazards which result in administrative controls aka procedures. Those procedures will have to control the unique hazards identified in the PHA.
  3. RAGAGEP for procedures (such as IIAR 7) require the buddy system be addressed in Line & Equipment Opening procedures.
  4. HazMat & Firefighting history show it is useful.
  5. Human Nature tells us that people tend to hold each other accountable.

 

Let’s work through this step-by-step

1. PSM/RMP requires us to have a procedure:

1910.119(f)(4) The employer shall develop and implement safe work practices to provide for the control of hazards during operations such as lockout/tagout; confined space entry; opening process equipment or piping; and control over entrance into a facility by maintenance, contractor, laboratory, or other support personnel. These safe work practices shall apply to employees and contractor employees.

Put another way: We have to develop a written procedure on Line & Equipment Openings which everyone must follow.

 

2. Hazards identified during a PHA are often controlled with Administrative controls, such as SOPs. SOP content therefore must address the hazards identified in the PHA. Some examples:

…the Ammonia exposure increases while the operator is using an APR/SCBA? (II.8) This is what makes us mandate the use of a personal NH3 detector during line openings and leak investigations.

…there is inadequate isolation prior to maintenance? (HF.3) …the Ammonia pump-out for a length of piping or for a piece of equipment is incomplete? (PO.1) This is why SOPs include a pressure check to confirm pumpdown. This is also why the LEO procedure (and permit) require a written SOP & permit to check the effectiveness of the procedure.

…an injured worker is unable to summon assistance? (HF.56) This (among other reasons) is why we require a Buddy System. The LEO policy, in the General Precautions section, states “A buddy-system is used for all LEO procedures. The second person must be trained to initiate emergency action and must be stationed close enough to observe the activity but far enough away to ensure that they would not be endangered by an accidental release.”

 

3. The RAGAGEP for procedures IIAR 7-2019 has this requirement:

4.4.2 Buddy System. Operating procedures shall indicate when the buddy system shall be practiced in performing work on the ammonia refrigeration system

A4.4.2-The buddy system should be practiced for operations where there is the potential that ammonia could be released, for example, operations which involve opening ammonia refrigeration equipment or piping. The buddy system should also be practiced during emergency operations involving ammonia releases.

 

4. HazMat & Firefighting history: Hazardous Materials teams and Firefighters have long used a 2-person team for increased safety. To some degree, this is enshrined in OSHA rules in 1910.134(g)(3)…

1910.134(g)(3) Procedures for IDLH atmospheres. For all IDLH atmospheres, the employer shall ensure that:

1910.134(g)(3)(i) One employee or, when needed, more than one employee is located outside the IDLH atmosphere;

1910.134(g)(3)(ii) Visual, voice, or signal line communication is maintained between the employee(s) in the IDLH atmosphere and the employee(s) located outside the IDLH atmosphere;

While we don’t INTEND to work inside a IDLH atmosphere during a LEO procedure, the possibility certainly exists if something goes wrong. The “buddy system” allows the person performing the LEO to focus on the work while the second person remains in the area situationally aware and ready to respond in the event that the situation changes or something goes wrong.

 

5. Human Nature: The LEO policy is written around accountability. The policy requires that we demonstrate to a second person that we’ve followed the policy and adequately prepared for the work before the LEO occurs.  The “buddy system” tends to keep the actions “in-line” during the actual work.

Note: While it’s certainly possible  – from a regulatory view – that you could have certain specific LEO procedures that did not require a “buddy,” you would have to be able to document how you managed to address all of the issues outlined above without the second person.

Thanks to Bryan Haywood of SaftEng.net and Gary Smith of ASTI (Ammonia Safety Training Institute) for their time and thoughts in helping review this post.

PSM is a Thief!

The view that PSM is a time-sink.

A common push-back from facilities that are covered under the OSHA PSM and EPA RMP regulations is the sheer amount of resources these programs require to successfully design, implement, and maintain.

One phrase, seared into my memory, is from a frustrated and over-burdened maintenance manager: “PSM is a thief!”

He was referring to the fact that he had to task high-performing, highly trained and highly compensated personnel to perform Process Safety tasks. Time spent on Process Safety is obviously time that isn’t spent elsewhere.

My counterpoint at the time was “Safety isn’t earned – it is rented. And the rent is due every damned day

After an experience I had last week, I think there’s a better way to respond. I’d like to share my new response with you, but first let’s talk about the experience that made me see a new way of approaching this issue.

 

The experience

During the recent RETA conference the guest speaker was Jóse Matta. Jóse suffered ammonia burns over 40+ percent of his body when a condenser failed in an overpressure event. The event involved a portable ammonia refrigeration system. Before transport the system is drained of ammonia. In this incident, the driver placed a cap on the relief valve outlet due to DOT concerns. However, once the unit arrived onsite, the capped relief valve wasn’t noticed. Eventually this led to an overpressure event once the unit was charged and started.

Jose Matta barely survived his exposure. He nearly died in the hospital. His wife was brought into the burn unit to say her final goodbyes to her husband – the father of their children. When he was lucky enough to survive, he had to endure multiple surgeries. He no longer has a sense of smell and can barely taste food. He no longer has the ability to sweat and has to constantly monitor his condition when it’s hot out to avoid heat-stress or heat-stroke.

 

What does Jose’s experience have to do with “PSM as a thief?”

Post-incident, several failures of the PSM program were noted:

  • Pre-Startup Safety Review failed to identify the capped relief.
  • SOPs and Training on startup either weren’t adequate to control the hazards, or weren’t followed.
  • Setup time and tight scheduling, location of safety showers, weren’t adequately addressed in the PHA.
  • The MI program didn’t ensure that the high-discharge-pressure interlock worked.
  • The technician and contractors at the site weren’t familiar enough to know there was a safety shower located in a nearby building.
  • The EAP didn’t provide adequate information to the facility or responders, leading to them delaying effective treatment.
  • There was no command system in place. Nobody called 911. Nobody took charge. Nobody met the responders when they arrived to explain what was going on.

If the Process Safety items above were properly in place, the incident either wouldn’t have happened, or the outcome would have been significantly better for Jóse.

You see, when I pushed back from the “PSM is a Thief” argument before, I was wrong. I should have agreed with that statement.

 

PSM *is* a thief. Yes, it takes resources, but it can also take a LOT more from you!

PSM can steal from you: the opportunity to nearly die in a chemical release.

PSM can steal from your family: the opportunity for tearful goodbyes.

PSM can steal from you: years of surgeries, painful rehabilitation, and diminished health.

 

Yeah, PSM is a thief. I’m plenty happy to have these experiences stolen from me and the people I work with.

Without Process Safety, people are taking risks without knowing they are taking them. NOBODY should have to do that.

If you want your Process Safety program to steal these experiences from your facility, your coworkers, your neighbors, and YOU, we can help!

Using the Hierarchy of Controls as a tool for Incident Investigations

The issue: Poor Incident Investigations and how to improve them

Often members of the Incident Investigation team miss some fairly obvious opportunities to improve their process safety. One trick is to use the Hierarchy of Controls as a brainstorming tool when coming up with causes and recommendations.

 

What is the Hierarchy of Controls and How can I use it as a tool during Incident Investigations?

The premise of the Hierarchy of Controls is that while hazards can be controlled in various ways, certain types of controls are inherently better than others. The hazard controls in the hierarchy are, in order of decreasing effectiveness:

Let’s take an example of an Incident Investigation concerning an unexpected employee NH3 exposure during an oil drain. While you will have to address any unique issues relating to the incident, here are some questions that the Hierarchy of Controls can provide for any oil drain incident:

Elimination: Physically removing the hazard. For example, when analyzing the risk of a valve packing leak in a process room, moving that valve to the roof would eliminate the hazard from the production room. Elimination is usually considered the most effective hazard control.

Substitution: Replacing the hazard with something that does not produce a hazard or something that produces a much smaller hazard. A common example of this is removing the hazard of NH3 in product chillers areas with the use of a secondary refrigerant such as CO2 or Glycol. Note that in some instances this results in simply relocating a hazard to another area with lesser consequences.

Note: We usually combine these two methods because if we don’t, we tend to spend more time arguing whether or not a control is an elimination or a substitution.

  • Can we avoid, or reduce the frequency of, the oil drains? Better coalescers, higher minimum head pressure to reduce oil blow-by, installation of an oil still to minimize oil draining from the system, etc.
  • Can we eliminate / reduce the NH3 involved in the oil drain? Pumpout of the oil pot and re-pressurization with shop air, conversion to a gravity drain oil pot, lower pressure suction during pumpout, etc.

 

Engineering Controls: These controls do not eliminate hazards but tend to attempt to control them or give notice when the process is approaching an unsafe state. Examples include NH3 sensors, Interlocks, High-Level Floats, Pressure and Temperature transducers, etc.

  • Is the equipment properly configured for a safe oil drain? Oil pot, “Dead-Man” valve, safe access, easy egress routes, etc.
  • Can we improve the ventilation in the area? Portable fans, local exhaust ventilation, manual use of existing Machine Room fans, etc.
  • Can we improve the hazard awareness? Local / Personal NH3 detector rather than relying on a fixed detector, pressure gauge installed during the pump-down, etc.

 

Administrative Controls: These controls are changes in the way the work is performed on or around the process. Training, Procedures, Signs and Warning labels are all administrative controls.

  • Can we improve the SOP? Better steps to address the hazards, mandating more oversight, required use of PPE, more effective use of ventilation, etc.
  • Can we improve the training? Better understanding of the hazards, procedures, PPE, tools, etc.

 

Personal Protective Equipment: PPE such as gloves, respirators, etc. is generally considered the last resort of hazard control.

  • Can we improve the PPE available? Can we make certain PPE mandatory? Improved gloves, smocks, respirators, etc.

 

Using the Hierarchy of Controls can be a great brainstorming tool to help you look at your possible causes, and your possible corrections from some new angles.

What can we learn from the Fernie Ammonia fatalities?

The October 17th, 2017 Ammonia release in Fernie, BC resulted in three fatalities:

On October 16, 2017, the curling brine chiller at the Fernie Memorial Arena was put back into operation after a seasonal shutdown. During the shutdown and seasonal maintenance, ammonia had been detected in the curling brine system, indicating that the curling brine chiller was leaking… A total of three people were found deceased in the mechanical room: the director of leisure services, the refrigeration operator, and a refrigeration contractor mechanic.

 

Three people died in a completely avoidable incident. If you want to know the particulars of the incident, I’d recommend you go read the Incident Report itself. While we can’t go back in time and avoid this particular incident, we can extract some valuable lessons from it to prevent a similar incident in the future.

There’s a lot that went wrong, but we’re going to focus on a few key failures in Mechanical Integrity, Process Safety, and Release / Incident Response. We’ll briefly discuss each failure and provide ten opportunities for improving your current Process Safety system.

Note: While this incident occurred in Canada, which does not have robust Process Safety regulation, we’re going to provide our analysis as if it was a PSM/RMP plant. Even if this incident had occurred in the US, the total system inventory was estimated at less than 1,000 pounds, placing it in the General Duty category. Most operators of these General Duty systems do not choose to implement a PSM system – hopefully this incident will cause them to re-evaluate that choice.

 

Equipment Age and installation: In 2011, the facility received a recommendation from their mechanical contractor to replace the chiller due to its age. It had been in service for about 24 years and had a life expectancy of 20-25yrs. (At the time of failure the chiller was in service for approximately 31yrs.) The facility actually budgeted for this replacement, deferred it, and then dropped the idea altogether. The report (and appendices) detail this decision making and indicates that the people making these decisions didn’t understand the underlying safety issues or the possible repercussions of these decisions. In part this was due to management turnover – the people who received the initial recommendation no longer worked at the facility when those recommendations were due to be implemented. Additionally, post-release, it was determined that the failed coupling was not properly supported.

Possible PSM citations: 1910.119(d)(3)(ii) for not installing the coupling per the manufacturers recommendations. 1910.119(d)(3)(ii) for equipment operating outside manufacturer’s recommended lifespan. 1910.119(e)(1) for the PHA not analyzing the hazards associated with operating outside the manufacturer’s recommended lifespan. 1910.119(j)(5) for operating the equipment with a known (service life) deficiency without assuring safe operation. 1910.119(m)(5) for not addressing and resolving a recommendation. (if the recommendation was made due to an indication of NH3 in the brine)

Opportunity #1: When a piece of equipment has a stated service life, you need to either replace the equipment per the recommendation or support your decision to keep it in service with a suitable engineering rationale.

Opportunity #2: When operators & contractors make recommendations, they need to provide CLEAR and defensible reasons for those recommendations.

Opportunity #3: When recommendations are delayed, deferred, or not completed, the operators & contractors need to ensure that the decision makers understand the implications of their decisions.

Opportunity #4: A Pre-Startup Safety Review (PSSR) and ongoing MI tasks need to ensure that equipment is installed correctly and maintained in a safe manner / arrangement.

 

Signs of Failure and Deficiency Response: The facility detected NH3 in the brine (by scent) in April of 2017 and then followed it up with a lab test of the brine showing over 3,000ppm of NH3 in June. The facility decided to continue operating the chiller and “monitor” it. A second test in August showed an NH3 concentration near 2,000ppm. Again, the facility decided to keep “monitoring” the situation. The report indicated that the personnel performing the tests and receiving the results didn’t understand the safety implications of them. Even after receiving the tests showing the chiller had failed, the facility decided to keep operating it. According to the report, there was no evidence the facility understood the hazards associated with a leaking chiller.

Furthermore, due to a miscommunication, the contractor believed the facility had taken the chiller out-of-service and they were preparing a bid to replace the leaking unit. The contractor’s recommendation to “monitor” the unit was likely meant to monitor it to see if the valves were leaking by, but the facility interpreted it as a go-ahead to continue operating the defective chiller until it could be replaced as long as they “monitored” it.

The contractor had no policy or procedure in place to deal with a failed chiller outside the usual troubleshooting, repair and replace activities. The investigators concluded that none of the people involved with the decision to continue operating the chiller had training or qualifications involving condition/risk assessment.

Possible PSM citations: 1910.119(j)(5) for operating the equipment with a known (integrity) deficiency without assuring safe operation. 1910.119(m)(5) for not addressing and resolving a recommendation. 1910.119(g)(1)(i) for not training personnel of the hazards associated with a leaking chiller.

Opportunity #5: Personnel reviewing test results need to understand the meaning of the test results and the safety implication of those test results.

Opportunity #6: When test results are provided to decision-makers, these results need to provide adequate information so that the decision-makers understand them and their safety implications.

Opportunity #7: When contractors are called to deal with deficient equipment, they will almost always provide guidance / estimates on how to repair / replace the equipment, but facilities should demand a risk assessment on continued operation of the equipment if they intend to continue its operation while planning and preparing for the repair / replacement.

From Appendix V of the report: “In the majority of instances, owner/operators relied heavily on the refrigeration contractor’s assessment of the equipment and evaluation of the NH3 indication in the brine samples. The owner is accountable for the safe condition and operation of the equipment but in some instances, deferment to the refrigeration contractor’s assessment and recommendations for the equipment was observed.”

Opportunity #8: When a facility outsources maintenance work, they often erroneously think that they are outsourcing the responsibility as well. It is important for a facility to understand that this remains their process and their responsibility. Ask tough questions of your contractors to ensure that you understand the condition of your system.

 

Facility and Contractor Incident Release Response: On the day of the release at 03:53 the machine room NH3 alarm registered 300ppm. Responding facility personnel observed the brine expansion tank shaking and spilling brine. At 04:30, the facility personnel shutdown the system and closed the chiller suction valve, observing that the shaking in the brine tank stopped. This should have indicated to the facility personnel that the separation between the brine and NH3 sides was completely compromised and that the brine loop was now full of ammonia. At 05:18 the facility personnel called the contractor to come in and re-configure the system to operate without the brine chiller.

At some point during the work, the personnel isolated the brine chiller, trapping the ammonia-laden brine in the chiller with no outlet available for it. As this ammonia-laden brine warmed up, the pressure inside the brine chiller rose and, at an estimated pressure of 30-150psig, a coupling on the brine-side of the brine chiller failed releasing the contents into the machine room and onto the personnel in the room. The estimated total NH3 release was 22 pounds (9lbs immediately vaporizing) resulting in an immediate concentration in the area of 20,000ppm which dissipated to about 5,000ppm over a period of 5 minutes.

The report uses electricity demand to conclude that the personnel did not attempt a pump-out of the brine chiller. Unlike a CSB report, the report does not go into the fatalities. We have no idea where the personnel were positioned in the room, or what – if any – PPE they were wearing at the time of the release. It can reasonably be surmised that they weren’t wearing any respiratory PPE at all.

Possible PSM citations: 1910.119(g)(1)(i) for not training personnel of the hazards associated with NH3 contaminated brine and the hazards of trapping it. 1910.119(h)(3)(ii) for the contractor not being trained in the hazards associated with NH3 contaminated brine and 1910.119(h)(2)(v) for the facility not ensuring this training occurred. 1910.119(n) for not providing “procedures to handle small releases.” 1910.119(f)(1)(i)(D) for not providing an emergency shutdown procedure. 1910.119(f)(1)(i)(E) for not providing an emergency operations procedure.

Opportunity #9: While we often train on the dangers associated with trapping NH3, the dangers of trapping NH3 contamination in a secondary loop is rarely discussed. Operator training in facilities that utilize secondary cooling loops must address contamination and its possible safety implications.

Opportunity #10: While it’s not possible to know for sure, it is extremely likely that all three of these fatalities could have been avoided if the personnel were wearing full-face APRs at the time of release. Note: They would have to have been wearing them, not have them “near-by.” APR’s aren’t magic.

 

090618 Update: Full WorkSafeBC Incident Report

Learning from Failure

“Failure is only opportunity to begin again. Only this time, more wisely.” –Henry Ford

We often push PSM practitioners to perform Incident Investigations for fairly minor events in the hopes that the lessons learned from those minor incidents will stop the larger incidents from happening. This is, in part, due to CCPS (Center for Chemical Process Safety) guidance that, for every single catastrophic accident, there are typically nearly 9,900 minor issues / process upsets and 99 near misses.

So, if you only investigate the catastrophic incidents, then you are only acting on 0.010% of the opportunities available to you to improve your control over the process.

OSHA has promoted this idea as far back as a decade ago…

OSHA and industry have found that when major incidents have occurred, most of these incidents have included precursor incidents. Additionally, OSHA and industry (See CCPS [Ref. 41], Section 5, “Reporting and Investigating Near Misses” have concluded based on past investigations, that if employers had properly responded to precursor incidents, later major incidents might not have occurred. Consequently, anytime an employer has an “opportunity” to investigate a near-miss/precursor incident (i.e., an incident that could reasonably have resulted in a catastrophic release) it is important that the required investigation is conducted and that the findings and recommendations are resolved, communicated, and integrated into other PSM elements/systems so a later major incident at the facility is prevented. …It is RAGAGEP to investigate incidents involving system upsets or abnormal operations which result in operating parameters which exceed operating limits or when layers of protection have been activated such as relief valves. (An example RAGAGEP for investigating incidents, including near-miss incidents is CCPS [Guidelines for Investigating Chemical Process Incidents, 2nd Ed.], this document presents some common examples of near-miss incidents). (OSHA, Refinery PSM NEP, 2007)

Going a step further, it’s often true that you can learn something about managing complex operations from businesses in entirely different fields. One field that I like to follow – in part because it’s endlessly re-inventing itself – is information technology.

Google recently published an article on their Post-Mortem culture, with a farcical worked-example that includes the movie “Back to the Future” and a newly discovered sonnet by Shakespeare. The practice of learning from their failures is actually part of their Sight Reliability Engineer handbook and you can read the entire chapter if it appeals to you.

“Failures are an inevitable part of innovation and can provide great data to make products, services, and organizations better. Google uses ‘postmortems’ to capture and share the lessons of failure…

… For us, it’s not about pointing fingers at any given person or team, but about using what we’ve learned to build resilience and prepare for future issues that may arise along the way. By discussing our failures in public and working together to investigate their root causes, everyone gets the opportunity to learn from each incident and to be involved with any next steps. Documentation of this process provides our team and future teams with a lasting resource that they can turn to whenever necessary.

And while our team has used postmortems primarily to understand engineering problems, organizations everywhere — tech and non-tech — can benefit from postmortems as a critical analysis tool after any event, crisis, or launch. We believe a postmortem’s influence extends beyond that of any document and singular team, and into the organization’s culture itself.”

Google’s Pre-Mortem Tool – Anticipating what can go wrong.

Google’s Post-Mortem Tool – Dealing with what actually went wrong.

Pencil-Whipping can Kill

What is it? Pencil-whipping is when you complete a form, record, or document without having performed the implied work or without supporting data or evidence.

Here are some common examples in NH3 refrigeration:

  • Completing “word orders” without conducting the work
  • “Signing off” on SOP reviews or PHA revalidations without actually reviewing or revalidating the documents.
  • Certifying training – or signing training attendance forms – without the training actually occurring.

Why take it seriously? There are several reasons, but here are some obvious ones:

  • You can be prosecuted for false statements resulting in fines and/or jail time.
  • There is significant legal liability if the action leads to an incident.
  • You can be fired for false statements
  • There can be significant safety repercussions to documenting work that wasn’t done.

I want to briefly focus on the last one – what can happen when you document that work was done when it actually wasn’t. If you are being assigned a task, we have to assume that the performance of that task is important to the system as a whole.

Imagine your job was to inspect some equipment that was prone to long-term wear – equipment that was relied upon for normal function. Now imagine that you didn’t conduct those inspections leading the users of that equipment to believe it was in proper working order. They are relying for their safety on YOUR lie!

Here’s what that can lead to:

And here’s what can happen when people investigate the incident:

Thursday morning, the General Manager and CEO of the Board Safety Commission released a statement regarding the firings: “…I want the Board, our employees and our customers to know that this review revealed a disturbing level of indifference, lack of accountability, and flagrant misconduct in a portion of Metro’s track department which is completely intolerable. Further, it is reprehensible that any supervisor or mid-level manager would tolerate or encourage this behavior, or seek to retaliate against those who objected. It is also entirely unacceptable to me that any employee went along with this activity, rather than exercise a safety challenge, or any of the multiple avenues available to protect themselves, their coworkers, and the riding public.

Since the derailment occurred, we have either taken action or are in the process of taking disciplinary actions involving 28 individuals. This represents nearly half of the track inspection department and includes BOTH management and frontline track employees.

Six employees have been terminated, including 4 track inspectors and 2 supervisors

Six more track inspectors are pending termination or unpaid suspension; and 10 more are pending possible discipline pending the outcome of the administrative process

Another supervisor termination is underway; and two more supervisors are pending the outcome of the administrative process

One Superintendent was demoted to Supervisor

One Assistant General Superintendent was demoted to Superintendent

One assistant superintendent separated from Metro before the review concluded

In closing: Pencil-Whipping is immoral, illegal and just plain wrong. Don’t do it.