Tag: RMP (Page 1 of 5)

Can long Lead Times impact Process Safety?


“Time Waits for No one…”

The Issue at hand

When I first started in NH3 refrigeration, you could pick up the phone, talk to your parts-guy, and get a replacement valve quickly: often the same day, but usually within a business day or two. While you were waiting for the part, you either operated the equipment manually (requiring a temporary SOP / MOC) or shut the equipment down during the wait. We call the time between when you order something and when it arrives, lead-time.

Because lead-times *were* short, parts inventory at most facilities were kept fairly low – usually limited to what would stop production. If you could get what you needed in a day or two, why keep it on the shelf, unless you were losing 20k+ an hour in downtime?

The situation has changed around us, and I’m not sure we’ve all thought through the implications of the current supply-chain issues. Lead-times have grown substantially in 2021 and, while relief is promised in the second half of 2022, these long wait times for equipment and components have the potential to adversely affect our Process Safety.

 

Current Lead Time estimates

Equipment / Component Lead-Times in Weeks*
Valves, Shutoff and Control 14-24
Valves, Relief 12-20
Vessels 14-24
Condensers 14-16
Compressors 16
Air Unit / Evaporators 36
Heat Exchangers 14

*Typical for NH3 components. Varies by brand. Some halocarbons lead-times are even longer.

 

How can this affect Process Safety?

When you don’t have a critical spare part, and won’t have one for several months, production demands are likely to force you to operate your equipment in “temporary” modes. Here are a few thoughts:

  • Temporary is a vague term, but we don’t normally think of temporary as “weeks” or “months”. Put this in more personal terms: if your city tells you they are working on a water main and you will be without water for 8hrs, your response is quite different than if you are going to be without water for 3 months. Not having running water for a few hours means you pour some in a pitcher and delay some dish washing / bathing. Not having water for months requires an entirely different approach.
  • It is very likely that the PSM element where you “identify hazards and control them” (the PHA) was based on customary lead-times, not the ones we are now facing!
  • If operating in a temporary mode has negative safety implications, it is far easier to make the argument that we should forgo production for a few hours, or that we can handle manual operations for a few hours, than it is to justify these “temporary” arrangements for several weeks or months. As an example, if an automatic makeup feed valve to a vessel fails, it’s not terribly difficult to manually manage that for a few hour manually while your parts provider gets you the new valve. It’s far more difficult to manage that issue around the clock for several weeks or months!

 

What should I do?

“The first responsibility of a leader is to define reality…” –Max DePree

Well, the first step is to start a discussion with your skilled technicians and make sure they understand the environment we’re all working in. Here are some points for discussion, and further actions to take:

  • Get a good understanding of the lead-times for the replacement components in your system.
  • With that understanding, talk with your skilled technicians about what is critical for operations.
  • Discuss what you’ve learned so far with operations / production management so you can understand what they “need” and they have a better understanding of these issues.
  • Review your PHA (with all the above in mind) to see what (if any) changes you might need to make to your spare parts/equipment inventory. You might want to bias towards a higher on-site inventory, and (especially if you are using a CMMS or other parts control software) increase your “minimum” inventory to 1 or 2, rather than zero.
  • Order what you need.
  • Consider pre-writing some “temporary” SOPs for operations when failures of critical components occur.

RC&E can assist you with your parts and spares. Click Here for our Line Card. Call Dennis Vaught 817-210-1957 or email him at dvaught@rce-nh3.com

Ammonia Process Safety below 10,000 pounds

One of the most frequent misconceptions we’ve been dealing with in our industry is a belief that being below the PSM/RMP threshold means you are in some sort of wild-west no-man’s-land where there are no rules. Previously we’ve dealt with that issue in a post called “General Duty vs. PSM/RMP: Is there a benefit to dropping below the 10,000lb threshold?” But that post was really written to people that were considering lowering their NH3 inventory to avoid regulation.

We thought it would be useful to put together an article that dealt with those systems that were already under the PSM/RMP threshold so they better understood the Safety & Regulatory landscape. To that end, we’ve put the information in a executive level 3-page pdf that is easy to email: Ammonia Process Safety below 10k

Email or call us today to have RC&E assist you with all your PSM/RM Program needs! info@RCE-Chill.com    (888) 357-COOL (2665)

Updated IIAR 2-2021 Standard Released

IIAR 2 – 2021 Standard for Design of Safe Closed-Circuit Ammonia Refrigeration Systems has been released by the IIAR and is now available for purchase on their website

The updated standard has several new requirements which resulted in some changes in the PSM/RMP program templates. Here are some of the highlights:

  1. The definitions file was updated with new IIAR 2 definitions. Added in-document headings to skip around the document easily.
  2. The PHA Checklist Template was updated to the new IIAR 2:
    • Added a note that “Provisions for plugs or caps required under IIAR 2 5.9.3.3” on all oil draining plug/cap questions.
    • Added a note that “IIAR 2-2021 5.12.2 requires a check valve during charging” in relevant Charging SOP section.
    • Added a question on Provisions for Pumpout per IIAR 2-2021 5.12.6 on PV1 subsection and all Equipment Subsections.
    • Added/Modified questions on RC1 section about low ambient temperature, and VFD resonance.
    • Added a question on EV2 (Liquid Heat Exchanger) equipment subsection regarding secondary coolant side pressure ratings.
    • Added a question on MR.C Checklist for Classified Space signage.
    • Added a question on PV1 (Piping & Valves) on new MOPD & MSSPD requirements for valves leading to atmosphere.
    • Added a question on PV1 (Piping & Valves) on requirements for unique identification for Emergency Shutoff valves.
    • Modified existing .PSV equipment sub-subsections to include IIAR 2 2021 15.2.6 requirement that liquid relieving reliefs relieve back into the system.
    • Added a note that IIAR 2 2021 13.2.3.1.1 limits carbon steel tubing and carbon steel compression fittings to valve sensing pilots, compressors, compressor packages, and packaged systems to all small-bore piping / tubing questions.
    • Updated MR.C section for new requirements regarding NH3 detection.
    • Updated various checklists (VENT, DET, PSV, DT) to match current IIAR 2-2021 text.
    • Added a new equipment subsection NMR.C for IIAR 2 Equipment located outside of Machinery rooms.
    • Added a new equipment subsection PKG.C for IIAR 2 Packaged Systems and Equipment.
    • Added a new equipment subsection IAC.C for IIAR 2 Instrumentation Controls.
  3. Updated “Contractor Door Sign” to meet new IIAR 2 – 2021 [5.14.1.1] information standards and some ASHRAE 15 – 2019 [11.2.1] standards.

Comments about the changes and the required steps to implement the document changes are present in the “Change Log and Reference” document at 08/02/21.

…Read on further in this post if you want to know about the changes in the new IIAR 2…

 

Continue reading

Justifying the Ammonia delivery?

The issue: A facility with an ammonia refrigeration system notes that their HPR level is rather low, and they are considering ordering some ammonia to get back to the levels they “used to have.” The thinking is that they need to add to the ammonia charge to make up for ammonia that was lost over the years.

Before you go too far, a good question to ask is: Did I lose ammonia? Or is it just somewhere else in my system?

 

What if you didn’t lose it?

Did you add equipment without your MOC addressing if this required an inventory adjustment? Did you change recirculator vessel levels which make the HPR look low even though the ammonia is still out in the system? Has someone been mucking with the HXV’s or TXV’s, so you are “brining” coils? These are common issues, but the most likely culprit is seasonal variation.

If it’s August in Texas, it’s likely that your system is running about as hard as it will ever run. That means that the NH3 isn’t just hanging out in your vessels, but out in the various heat exchangers (and their piping) doing its job. The “good old boy” method of testing this was to wait until the cool of the night, shut down the liquid feed to your “load,” and check the vessel levels after the NH3 came back.

A more “modern” method is to use an inventory spreadsheet and adjust the levels in the heat exchangers to reflect the summer load. The intricacies of doing either of these are better dealt with in the real world rather than a blog-post, so let’s assume you have already checked this and you actually do need ammonia. (Note: if you need assistance with either of the above, we can certainly assist you, just give us a call)

 

Ok, maybe we did lose it!

If you look into the situation and find out that you actually do need ammonia, there are a few considerations you should think of BEFORE you order that truck and start preparing for delivery.

  • Figure out how much. Use an inventory spreadsheet (or have an engineer do it for you) to figure out how much ammonia you need to get back to your “normal” level.
  • Review your charging SOP with the refrigeration team to make sure you are all on the same page. This isn’t something you should be doing often so take this opportunity to review, validate, and TRAIN on this procedure.
  • Document where you think that ammonia went! For most facilities this is just calculating your “leak” rate. This is your “justification” that you are replacing lost ammonia, not adding to your intended inventory level.

 

Justifying the charge

Assuming you didn’t have some sort of incident that clearly explains why you need ammonia, we should figure out how to justify the amount we’re adding. Most losses are easily justified by establishing a “loss rate” and comparing it to accepted norms. This acceptable loss would be caused by normal maintenance, auto-purgers, and fugitive emissions.

In my opinion, anything less than 5% is good. 2-3% is excellent. For what it’s worth, the IIAR has stated that up to 10% loss a year is “reasonable.”

A loss rate of 3% or less a year can easily be explained from normal maintenance, auto-purgers, and fugitive emissions.

This is easier to explain with a worked example from a friend. In this case, their inventory level is supposed to be 5,800lbs. When they updated their inventory sheet to reflect the actual conditions at the facility, they saw a calculated current charge of 5,000lbs reflecting an 800 pound loss. That loss occurred since their last charge 5 years ago.

This percentage is easily explainable from maintenance and other fugitive emissions, and it’s also quite reasonable.

If you take the time to figure out the math above, and then document your calculations to justify your NH3 charge, it helps avoid unpleasant assumptions on the part of the EPA and OSHA in any future inspections.

If an auditor comes in and sees an ammonia delivery receipt, a documented rationale why the ammonia was needed, the SDS of the chemical charged, and you have a compliant charging procedure, it would be very unlikely that the charging process would be questioned further.

Of course, if your math shows a high leak rate, then you had better get an incident investigation going and figure out what’s wrong!

 

P.S. – To assist in this effort, Scott updated the Ammonia Inventory example template has been updated to help automate this process. Just enter the old value, newly measured value, and time (in months) since last charging and you have a 1-page report on the % loss per year. We hope this helps. The file can be located at: \ PSM-RMP Program Templates \ 03 – Process Safety Information \ Optional Resources \

PHA Synergy: How to get more out of the PHA process

According to 1910.119(e) and 40CFR68.67(a) the purpose of a PHA is to “…identify, evaluate, and control the hazards involved in the process.” Since the mid-90’s the refrigeration industry has done this mainly through the IIAR’s “What-If” methodology as suggested in their Compliance Guidelines materials.

There have been many revisions of this material over the years, but they all have the same thing in common as you use them: You can see how each question / item:

  • Poses a failure scenario (sort of a lesson someone else has already learned)
  • Prods you to solve the issue through an existing RAGAGEP

For example, a question might ask something like “What if plugs, caps, or blind flanges are missing on purge or drain valves?” This should prod you to recall that both IIAR 2 and IIAR 4 require that these things be plugged, capped, etc. This should also prod you to ask how you are addressing this requirement in your Process Safety Program.

The issue we always came across is that you must KNOW or MEMORIZE what the RAGAGEP says in a very complete way, or you miss the connection between the “What If” scenario and the RAGAGEP. This is nearly impossible because it seems like RAGAGEP is multiplying at an alarming rate. Furthermore, this (at least two day) process often feels like a futile effort at figuring out what the “What-If” scenario questions are really getting at.

To improve this years ago, I started adding two things to the IIAR standard questions:

  1. References to the IIAR standards where appropriate. (For example, in our plug question, reference IIAR 2-2021 13.3.2.6 & IIAR 4-2020 10.4.5.4)
  2. Explicit checklists that allow you to compare your system to appropriate RAGAGEP outside of the “What If” scenarios.

 

It’s very easy to lose sight of evolving RAGAGEP over time. These checklists allow you to perform a forensic examination of your system compared to current RAGAGEP. In addition to the issue of improving RAGAGEP compliance, we also face other challenges.

  1. It is common to show up to perform a PHA and find the client lacks critical Process Safety Information and PSM elements & procedures making a compliant PHA extremely difficult or impossible.
  2. Incident Investigations are often in a state of disarray or incomplete making their inclusion in the PHA difficult at best, and almost meaningless at worst.
  3. IIAR 9 now requires an evaluation against its minimum requirements for all NH3 refrigeration systems at least every five years.
  4. In some regions the EPA has an almost absurd number of questions they “like” to see in your Facility Siting sections.
  5. The Emergency Action Plan is a critical safeguard in your program, and it is usually missing some basic items that aren’t apparent until you try and use it in an emergency.
  6. Finally, the IIAR has standards on Installation, Commissioning, and Decommissioning that are often overlooked.

 

This again leads us back to checklists. I created them for basic PSI & PSM items, Incident Investigations, IIAR 9, Facility Siting, EAP, IIAR 4, IIAR 6, and IIAR 8. Here’s what that looks like:

As you can see, that’s fairly comprehensive, but it’s also a lot more work! To adjust to all this, we usually perform PHA’s in a two-step process.

Step 1: Weeks in advance, we give the client the relevant checklists and have them fill them out to the best of their ability.

Step 2*: Once we’re on-site, we go over the checklists they’ve worked on to answer any questions, address discrepancies, etc. THEN we move on to the “What If” scenarios.

* Of course, if the client wants, we can always book another two or three days of our time helping them on-site with Step 1.

 

The result of this longer, more comprehensive process is:

  • A nearly point-by-point check of the facility (and their Process Safety program) against common RAGAGEP from a HAZARD perspective rather than a compliance one.
  • A much better understanding of the “What If” scenario questions when we get to them after the checklists.
  • Cleaner, more systemic recommendations that point to specific hazards and the RAGAGEP that most effectively addresses them.
  • At the end of the PHA process, facility team members have a much clearer understanding of where the requirements and recommendations are coming from.

You can learn more about our PHA offerings here. Email or call us today to have RC&E assist you with all your PSM/RM Program needs! info@RCE-Chill.com    (888) 357-COOL (2665)

The 2020 Christmas Update

Merry Christmas to our Ammonia Refrigeration Process Safety community!

 

Well, this year has been interesting, eh? The hits keep coming it seems, and it was no different to those of us in the Process Safety field. Behind the scenes, we’ve been working on a fairly major set of improvements to the PSM system. Originally scheduled for August, we’ve finally managed to push it across the finish line just in time for the Holidays!

Significant improvements were made to the core of the system (The SOPs and ITPMRs) through an unprecedented amount of end-user feedback. Remember, this system relies on the feedback of operators, technicians, service personnel, and Process Safety professionals to improve.

All updated documents have the 122520 date-code, but here’s a run-down:

  • Minor updates to definitions file
  • All element written plans:
    • Where it was appropriate, did a little harmonization with the newest IIAR Process Safety Management & Risk Management Program templates. (There isn’t really anything they cover we don’t, but there are some places we harmonized the phrasing where we cover the same ground)
    • Ensured all element Written Plans refer to the ROSOP QA – Document Quality Control section in the Document Management
    • Minor editing / formatting improvements
  • Minor change to Operator Training element to ensure that Initial Training on Incident Investigation includes a review of recent and routinely recurring incidents.
  • Improvements to the II element written plan’s “Incident Investigation Process Flowchart”
  • SOPs
    • Minor changes to the Implementation Policy: Review and Annual Certification to harmonize with the IIAR guidance
    • Annual SOP Certification letter improved to correlate with the SOP element Written Plan more closely
    • The SOP element Written Plan Implementation Policy: SOP Authoring / Generation section now provides “Best Practices” standard language for warnings, step comments, step instructions, etc.
    • ALL SOP Templates now:
      • Use the “Best Practices” language.
      • Include better language tying them to the ITPMRs
      • Reference ROSOP-PPE in the Safety considerations section
      • Additional Equipment Considerations added to harmonize with the IIAR guidance
    • ROSOP PPE slightly improved with reference to LEO
    • ROSOP LOTO improved with improved language from end-users
    • Minor updates to ROSOP QA – Document Quality Control section.
    • ROSOP LEO streamlined and simplified with a good amount of end-user feedback
    • New ROSOP ITPM based on significant end-user operator input and feedback (See MI section below)
  • MI / ITPMRs
    • All ITPMRs now provided as PDF forms as well as Word documents
    • All ITPMRs have improved references including to the new ROSOP ITPM
    • All ITPMRs now have a space to record task hours
    • All frequency ITPMRs are now in a single document. For example, previously we would have a 30-day, 90-day, and 365-day ITPMR for condensers. Now we have a single ITPMR for condensers with all the items and you simply use the applicable sections. This allowed each step in the ITPMRs to have its own unique step code. This is important because….
    • A new SOP was created called ROSOP ITPM which includes additional information for less-skilled operators and technicians. This new ROSOP also is used as a repository of best-practices and collected knowledge from field operators. Relevant guidance from applicable IIAR standards was also included directly in the SOP where we thought it useful to those performing the MI work. A group of contractor service technicians and end-user operators contributed to the creation of this SOP and We FULLY expect this SOP to grow and improve as we get even more field use and operator feedback.

 

To implement:

  • Written Plans: Follow the Implementation Policy: Managing Procedure / Document Changes. These should be straight-forward.
  • Definitions file: Replace with the new one
    1. For the new PPE and LOTO templates, either adopt them as-is or incorporate their changes to your existing PPE & LEO SOPs
    2. For all your equipment SOPs, consider updating them to the new language during your next scheduled revision / team review.
    3. For the NEW ROSOP-ITPM and PSSRs see the MI section below
  • MI: Replace the existing ITPMRs with the new ones, providing training that when the CMMS (or other scheduling system) calls for a frequency based ITPMR, just use the equipment specific ITPMR and fill it out to the appropriate frequency.
  • Provide training on the new ROSOP ITPM. Please collect feedback for improvements so we can all improve its performance.

How to respond to a Compliance Audit Report

Both PSM and RMP require a 3-year audit to “verify that the procedures and practices developed under the standard are adequate and are being followed.” While it is not required, this Compliance Audit is traditionally done through a 3rd party. A common failing I see in this element is end-users not understanding what to do with the Compliance Audit once they’ve received it. What follows are my thoughts on best-practices once you’ve received the Compliance Audit report.

  • Verify the Report
  • Certify the Report
  • Address the Findings / Recommendations
    1. Assess validity
    2. Decide on a solution to address valid recommendations
    3. Implement the solution including any needed interim solutions
    4. Document the resolution as closed

 

Verify the Report

You will want to ensure the report meets the requirements of the PSM/RMP rules as well as your internal Compliance Audit element Written Plan. First thing to do is to read through the report and any findings / recommendations to familiarize yourself with it. Your report may look different than the ones I deliver, but mine have five main parts:

  • An introduction letter describing the audit methodology and the report’s format
  • Closing meeting notes discussing highlights of the report and next steps.
  • An Audit Certification Page (discussed in the next section)
  • Statement of Qualifications: Qualifications of Company and PHA Facilitator / Compliance Auditor, Conflict of Interest Statement & Disclosure. This is basically a written answer to common “Who did this audit and why should we trust them” questions.
  • Compliance Audit worksheets & Findings / Recommendations

Once you understand the format of the report, decide if it met the goals of a Compliance Audit. I use the 3-levels of compliance as my performance basis.

Once you’ve established that the Compliance Audit report meets this performance basis, make sure it is:

  • Complete
  • Free of any copy-paste errors
  • Lacking any blank spaces / questions

If you have any questions or concerns, work with your auditor to address them at this stage, because once we go to the next step, this report is “set in stone.”

 

Certify the Report

Both PSM and RMP require that the employer/owner/operator certify the Compliance Audit report. I include a letter to be dated and signed. This step is often missed but it’s a very simple thing. You are not certifying that the report is 100% accurate, found every single thing wrong, etc. All you are certifying is that “you have evaluated compliance…to verify that the procedures and practices developed under the standard are adequate and are being followed.” In some sense, you’re really certifying that this collection of documents is your Compliance Audit, that you have received it, and that you believe it to be accurate.

 

Address the Findings /Recommendations

Each non-compliance finding will require some sort of action on your part. To assist in this endeavor, I personally rate the findings on a 4-level scale.

A simpler explanation of that rating system might be:

Green: All Good.

Yellow: It’s good, but there might be a better way.

Orange: This is wrong and can get you fined bur probably won’t get anyone hurt in the short-term.

Red: This is wrong and can get someone hurt or even killed.

Below is the flowchart from our model PSM/RMP program on dealing with recommendations. Please see this longer post on the subject for more information. Properly Addressing PSM / RMP Findings & Recommendations

Recommendations will be considered “addressed” when a plan has been put in place to address them. In some cases, a recommendation will not be accepted. OSHA considers an employer to have resolved recommendations when the employer has either adopted the recommendations or justifiably declined to do so. According to OSHA, an employer can justifiably decline to adopt a recommendation where it can document that:

  • The recommendation contains material factual errors;
  • The recommendation is not necessary to protect the health of employees or contractors, the public or the environment;
  • An alternative measure would provide a sufficient level of protection; or,
  • The recommendation is not feasible.

Whether accepting or rejecting a recommendation, it is important that you document your reasoning for doing so and any progress you are making, or have made. In our system we rely on an Implementation Policy called “Resolution of Recommendation” to do this. Below is an example of a recommendation that was tracked to resolution. Note that since it is now complete, they have shaded it green.

Conclusion: While it’s time consuming and labor-intensive, dealing with Compliance Audit recommendations is a fairly straight-forward task. As always, feel free to Contact Us if you have any questions, and check out our Compliance Audit section if you would like us to perform your next Compliance Audit.

Note: Nearly everything in this article is equally true for reports and recommendations from PHA’s, independent Mechanical Integrity Audits, etc.

What we can learn from the tragedy in Beirut, Lebanon?

“Smart people learn from their mistakes. Wise people learn from the mistakes of others.”

Or, in PSM terms: Incident Investigation is how you become smart. Process Hazard Analysis is how you become wise.

Yesterday, a horrific explosion occurred in the port of Beirut, Lebanon. This morning it is being reporting that over 100 are dead, over 4,000 are injured, and up to 300,000 are homeless. Estimates of the economic damage have been as high as five billion dollars. 

Beirut, Lebanon 080420

Beirut, Lebanon Explosion 08/04/20

It is believed that the explosion was the result of 2,750 tons of ammonium nitrate stored at the port. The authorities will now have to try and piece together what happened to see what they can learn from this incident.

Beirut, Lebanon 080420

Beirut, Lebanon Explosion Aftermath 08/04/20

In PSM terms, this is where we implement the Incident Investigation element. Refer back to that earlier quote, “Incident Investigation is how you become smart.” One of my first mentors put it another way: “Wisdom is healed pain.” It is right and proper that we learn from the mistakes we make, but there is a better way: Learn from the mistakes of others so you don’t repeat them!

Al Jazeera is reporting that the chemical storage was known about for seven years, and while the port authorities asked for assistance in dealing with the dangerous situation SIX TIMES, they did not receive a response. It appears that the authorities in Beirut had the information they needed to KNOW they had a hazards to address for many years. 

The dangers of Ammonium Nitrate explosion are WELL KNOWN.  Check out this older article on the events in West, Texas – or check out the pictures I took there after the explosion. (Note, according to the Al Jazeera timeline, the improper storage of this chemical in Lebanon began right around the time of this incident in America.)

West Texas 2013

Ammonia Nitrate explosion damage in West, Texas (2013)

A proper PHA prevents incidents. In the PHA process, we Identify hazards, Evaluate those hazards, and then Control those hazards.

A timely Process Hazard Analysis would have shown OBVIOUS problems with Facility Siting, RAGAGEP compliance, and equipment / facility suitability. It appears that in Beirut, the port officials informally identified at least some of the hazards, and to some degree they analyzed them. Those responsible in Beirut had AMPLE opportunity to CONTROL the hazards but chose not to – for reasons we don’t yet know. 

Put another way, because they did not accept their responsibility to perform a Process Hazard Analysis, they now have to accept their somber duty to perform an Incident Investigation.

Incident Investigation is how you become smart. Process Hazard Analysis is how you become wise.

Are there any issues in your facility that you are aware of that you haven’t yet addressed? Consider this tragedy in Beirut as a reminder to take action on them. There’s no time like the present!

P.S. There are large Ammonia Nitrate stockpiles all over the world. When stored properly it is very, very safe. But storing it next to a fireworks warehouse in a vault that wasn’t designed for it is begging for a disaster.

 

— Update: The Times of Israel quotes Lebanese Prime Minister Hassan Diab as saying: “What happened today will not pass without accountability. Those responsible for this catastrophe will pay the price.” With respect, no, they won’t pay the price.

The people that died paid the price. The loved ones of the deceased, the people that were injured, and those who are now homeless are paying the price. The people responsible may pay a price, but it’s unlikely to be as severe as the one paid by those who had no part in the series of errors that lead to this catastrophe.

Digging Yourself Out of a Hole

(What to do when you are suddenly responsible for years of Process Safety neglect.)

It’s a scene I come across time and time again: a newly assigned PSM/RMP coordinator staring at me with shock as we progress through their Compliance Audit, Process Hazard Analysis, or 5yr Independent Mechanical Integrity Inspection.

“I didn’t know things were this bad!” they’ll say under their breath, once the situation starts to become a little clearer to them. You can imagine them standing at the bottom of a deep, dark hole wondering how they’ll ever make it back to fresh air and bright sunshine they thought was all around them just a few hours ago. For those of you that have read my previous post on the “Stages of PSM Grief,” this is the moment they are breaking past the Denial stage.

It can be heartbreaking to watch the mixture of Anger, Bargaining and Depression, especially if you remember what it felt like to be there yourself.

Often, I will have to re-assure them that this is just the start of the process and the beginning isn’t going to be fun. Sometimes I’ll quote Winston Churchill.

 

 

What’s really important is that we understand there will be a way out if we remain calm and plan intelligently. Unsurprisingly, you need a process to address Process Safety issues

So, let’s start planning our escape!  We’re going to move slowly at first, with ever-increasing confidence, and once we get rolling we’re going to start seeing daylight.

Here’s how our progression will look:

  • Assess the situation
  • Prioritize the issues
  • Formulate the plans & assign responsibility
  • Implement, Implement, Implement!

 

Part 1: Assess the situation

Obviously, if you are in the middle of (or have just gone though) an audit or inspection, you’re well on the way! If you are recently assigned to this coordinator role and you don’t have a recent compliance audit, PHA, and MI report, then these are good places to start.

Assessment is really two parts which can share the same ground.

  • Compliance: Where you are in relation to where you need to be.
  • Culture: Where you are in relation to where you want to be.

I can’t stress this enough – being compliant is not some lofty place. It is the bare minimum of safety allowed under the law. How far past “my company isn’t violating federal and state law” you want to go depends a lot on the culture of your organization. For example, companies with a brand to protect tend to aim a lot higher than those that don’t. Companies that are barely making ends meet tend not to have a lot of resources to bring to bear on things that aren’t strictly required.

Recently, based on a conversation with colleagues, I half-jokingly formulated what I called the Haywood / Chapin Process Safety performance scale as a visual tool. Note that you get a score of zero for being compliant because that’s the baseline. We’re not going to go around congratulating each other for not violating Federal and State laws. Additionally, we aren’t going to give ourselves any credit for trying – only for results: Safety & compliance aren’t kindergarten so we aren’t giving out participation trophies.

Note: It’s common at this point to try and figure out how the company got themselves in this hole, but there is usually very little of value that comes out of this conversation. If the same people, and the same processes are in place, don’t expect different results unless they are willing to change. Don’t get your hopes up just because people want to change. What matters is if they are willing to put in the work to change. If only wanting to change was enough to effect change, nobody (including me) would be carrying around a few extra pounds.

 

Part 2: Prioritize the issues

All right. Now you have collected all the deficiencies so you know the ground you need to cover to get where you want to be – or, in our analogy, how far it is to get out of the hole you are in. Now we need to figure out in what order we need address these issues. Hopefully, your audits have given you some guidance here. For example, this is the color code I use for my compliance audits:

Obviously in this scheme, we’d focus our efforts on the red items, then the orange, etc. You will want to prioritize the actions you take based on the risk to your employees, your community and your business. I strive to get the “buy-in” from the audit team during the audit itself so this step is pretty much done for you. However, you may have a lot of findings & recommendations to deal with so further prioritization can be useful.

 

Part 3: Formulate the plans & assign responsibility

Formulating the plan(s) is one of the most difficult parts of the whole endeavor: How do we address all the issues we’ve found? We’re going to use a few strategies to help us formulate our plan:

  1. Group where appropriate
  2. Don’t reinvent the wheel
  3. Don’t make Perfect the enemy of the Good
  4. Leverage strengths & Avoid weaknesses

Grouping: One thing you may find is that a common root cause means you can group items. For example, if I have a poorly constructed MI inspection with 600 pipe label recommendations I can view each of those as individual recommendations or I can decide that the root cause is that we don’t have a system to ensure adequate pipe labeling. For me, I’d rather put a system in place to address that widespread deficiency than rely on just fixing the issues someone else found thus ensuring I’ll need them to find them next time too! For the example of pipe labeling, I would train my operating staff on the requirements of IIAR B114 and place a label check in the annual unit inspection work order. Properly implemented that system ensures that the issue will be addressed in the next year and will continue to be addressed regularly thereafter.

Don’t reinvent: There’s plenty of freely available templates for nearly all programs, procedures, work orders, etc. you may need. Don’t waste your time creating a policy or procedure from scratch when you can often use a pre-made one to address the issue with little or no change.

Don’t make Perfect the enemy of the Good: Sometimes altering a simple policy that solves the problem 99% of the time to one that solves it 100% of the time turns it into a lengthy and confusing mess. Policies and procedures aren’t meant to completely replace independent thought – they should be designed to guide it. We should bias our efforts towards “good enough” at first and strive towards perfection over time with continuous improvement.

Leverage Strengths & Avoid Weaknesses: Tasks should be assigned to people based on their competencies. For example,  if you have a good core competency in your staff for writing SOPs, then by all means go ahead and write them. But if you don’t have anyone with that experience, maybe outsource that issue so their time is spent on the things they are already good at. Using a stock template and the needed PSI, my personal average for SOPs is about 1.5 hours. I’ve seen relatively competent people take 10 hours or more on the same SOP. The difference is that I wrote the template and have used it thousands of times. On the other hand, I’ve been known to take 3x as long as a skilled operator to change oil filters on a compressor because I’ve only done it a handful of times.

Assigning Responsibility is crucial. What we want is to have someone own the solution. Even if you assign a task to an outside consultant or contractor, make sure someone in-house is assigned the responsibility for the task to ensure they keep that 3rd party in-line and on-schedule.

Also keep in mind that this is a great place in this process to manage expectations. Often a facility has been neglecting their PSM duties for decades but seems shocked that the newly assigned PSM coordinator can’t solve the problem in a few weeks. Let’s just say that if it took 10 years to dig the hole, it’s not realistic to expect anyone to dig you out of it quickly.

 

Part 4: Implement, Implement, Implement!

Prussian military commander Helmuth van Moltke is famous for saying that “No plan survives first contact with the enemy.” You are never going to get anywhere until you go out there and start implementing your plans. You can’t build a reputation on what you plan to do. 

Don’t be hesitant to reassess and change the plan if things aren’t going well.

One of the most important things you can do during this part of the process is having regular PSM meetings. Make sure everyone assigned a task is asked about their progress. It may seem like a waste of time, but it’s also a good practice to go over the things you have already accomplished. I recommend this for two reasons:

  • It gives everyone a chance to confirm that the implemented solution to the issue worked
  • It reminds you that progress is being made and you will eventually get out of the hole if you keep on!

 

 

As always, if there is anything we can do to help, please contact us!

How Many Operators do we Need?

Disclaimer: This post is a collaboration between an industry friend and colleague, Victor Dearman and I. The views expressed here do not necessarily represent the opinions of any entity whatsoever which we have been, are now, or will be affiliated.

It’s a question we hear often – sometimes as part of a PHA or Compliance Audit, but more often with someone just struggling to justify their staffing requests. Unfortunately, there really isn’t a simple, definitive answer to the question. No controlling RAGAGEP exists and state / local laws on the topic are relatively rare. This sort of problem isn’t rare in PSM because it is a performance-based standard. Our performance basis is that we are staffed sufficiently to ensure the safety of the people within the building and the surrounding community.

We need to answer the “How many Operators do we need?” question in a way that we can support it, or as we like to say, “Build a defensible case for the answer we arrive at. The answer itself will depend on many, many factors. So, let’s go on a journey and see how we can arrive at an answer we can feel confident in.

 

The road to an answer

The biggest factor for many is the design (age!?) of the system controls. A modern system with advanced controls requires less oversight on a day-to-day basis. If your system still relies on manual controls and people writing down pressures every hour, then that’s going to have a significant impact on your staffing needs. But once we get past that obvious issue, things get a bit more complicated.

Let’s be honest here, if things are running well; you have a good history with compliance audits, inspections, incident investigations, etc. and a low MI backlog, you’re probably not asking this question. If you are asking this question, it is probably due to an event related to a PSM/RMP element.

Let’s look at the kinds of element events that typically lead to this question.

  • Employee Participation
  • Mechanical Integrity
  • Incident Investigations
  • Management of Change / Pre-Startup Safety Review
  • Process Hazard Analysis
  • Emergency Action and Response Plans

 

Employee Participation: Look, everyone feels over-burdened at work, especially in the modern “Do MORE with LESS” era. But, if you pay attention to it, and look at these other elements, this employee feedback can provide valuable insights into the adequacy of your staffing.

 

Mechanical Integrity: What we’re looking for here is to understand if you have the skill sets and staffing to adequately maintain your refrigeration system. Whether you do everything in-house, or have a small in-house small crew performing basic rounds and contract out all the rest of the maintenance, inspections, and tests, is it adequate?

Here’s some MI related questions you might ask to help you determine if your staffing is adequate:

    • Are we properly implementing our Line & Equipment Opening (sometimes known as line break) procedures?
    • Are we caught up on ITPMR’s (Inspection, Test and Preventative Maintenance Reports) or work orders?
    • Is the documentation of ITPMR’s, Work orders, Oil Logs adequate?
    • Are we performing our scheduled walk-through’s and documenting them properly?
    • Are we addressing MI recommendations in a timely manner?
    • Are there indications that maintenance of the facility and system are being conducted properly and required repairs aren’t being delayed?
    • Are there no indications in the written MI records, or in your observations, that the system is running outside the written operating limits?

 

Incident Investigations: A review of incident investigation history can tell us a lot if the facility has a good process safety culture. But if they don’t have the right culture, and /or they don’t have any documented incidents, you’re going to have to do a little detective work and interview plant employees to find out if incidents are occurring that aren’t being recorded. Remember to spread your net wide here because incidents can happen at any time, not just on day-shift: Backshifts, weekends, holidays, etc. there’s no time immune to a possible incident.

You may also find indications of incidents occurring in walk-through logs, communications logs, ITPMRs, work orders, etc.

Here’s some II related questions you might ask to help you determine if your staffing is adequate:

    • Are incidents being reported, conducted, and documented? If not, is this a culture issue or a staffing issue?
    • Are incidents and incident report findings & recommendations being addressed, communicated, and followed to their conclusion?
    • Are there incidents that could have avoided with proper staffing?
    • Are there incidents that would have their severity reduced with proper staffing?
    • Are there incident investigations with recommendations that could be addressed with proper staffing?

 

Management of Change / Pre-Startup Safety Review: Properly implementing the MOC and PSSR elements takes a lot of time! We often find that these two elements are amongst the first to “fall behind” in suboptimal staffing situations. Here’s some MOC/PSSR related questions you might ask to help you determine if your staffing is adequate:

    • Have MOCs / PSSRs been conducted when they were supposed to be?
    • Were the MOCs / PSSRs conducted adequately to properly manage the hazards related to the change and the new systems?
    • Is the documentation of MOCs / PSSRs complete?
    • Are there any open items or recommendations from MOCs, PSSRs, or project punch lists?

 

Process Hazard Analysis: The PHA and open PHA recommendations can also help us understand if our staffing levels are appropriate. There may also be indications in the PHA itself. There’s a portion of the PHA that deals with staffing directly, but we’ll deal with that in the Building a defensible case on staffing section of this article.

Here’s some PHA related questions you might ask to help you determine if your staffing is adequate:

    • Are the PHA recommendations being addressed, communicated, and followed to their conclusion?
    • Is the facility provided with modern controls, alarm systems and equipment? (Newer, modern facilities often have significantly lower staffing needs than older ones)
    • Has the PHA been updated / validated as required by MOC activities and the 5yr schedule?

 

Emergency Action and Response Plans: Whether we’re looking at the plan(s) themselves, or analyzing an after-action report, a there can be a lot to learn here concerning proper staffing levels. Obviously, the required staffing levels for Emergency Response facilities is going to be higher, but that doesn’t mean there is no staffing requirement for Emergency Action plans.

Here’s some EAP/ERP related questions you might ask to help you determine if your staffing is adequate:

    • In the event of an incidental release of ammonia, do you have adequate staffing to investigate and respond?
    • In the event of an emergency response, even if you are not a “responding” facility, do you have adequate staffing to ensure that the equipment is properly shut down?
    • If you are a “responding” facility, do you have enough adequately trained personnel to staff your response team including an Incident Commander, safety officer, decontamination personnel, two entry teams, etc.
    • If you bring key staff back on-site to deal with emergencies, are they close enough to respond in a timely manner, or do you need to increase the size of your trained response team?

 

Building a defensible case on staffing

Ok, we’ve answered our questions and gathered a good impression of where we stand – and where we should stand. Maybe we need to adjust our staffing levels and/or increase the amount of services we ask contractors to complete for us. Where should we document this? In our opinion, the place in the system where the facility already had the opportunity to address this issue, was in the PHA. So, let’s go back to the PHA, end see if our results match the PHA team’s.

You are going to be looking for the following two questions (or their equivalents) from the standard Human Factors section of the IIAR What-If /Checklist worksheets:

HF14.37 – What if an employee is stressed due to shift work and overtime schedules?

HF14.38 – What if there are not sufficient employees to properly operate the system and respond to system upsets?

During the PHA, the facility should have answered those in a way that says they have adequate staffing or recommended that staffing be increased. Let’s say you decided that you had adequate staffing based on your answers to the questions above. If that’s the case, we’d expect to see something like the following:

 

If, however, we found some areas for improvement, we might expect something like this:

 

Closing Thoughts: We hope you didn’t start reading this hoping for an easy answer, but we’re fairly certain – now that you understand the full scope of the question being asked – that the answer doesn’t need to be easy, it needs to be correct and defensible.

You can build a much better understanding of your staffing needs by looking at the existing elements in your Process Safety program. Any decent Compliance Audit would cover this same ground, if staffing is an area of concern for you, make sure to bring it up.

P.S.  from Victor: Some facilities might try and get more value from a security guard on off shifts or holiday coverage to make roving patrols and report abnormal conditions and alarms? Sure, but that also means that guard has to be trained to identify what the alarms mean, how to identify an abnormal condition, and that they know what to do to either immediately correct the deviation or immediately contact someone that can (on call techs or service providers). By the time you have invested this much into a guard, you could have paid for a well-qualified operator.

My advice to any organization when making these decisions is to evaluate the above and take into consideration the attracting well rounded operators with the skill sets and experience often sought is more often through word of mouth about how the organization projects their Process Safety culture.

How to hire operators? Well, that sounds like a good subject for a future article!

« Older posts