Dear Colleagues,

Every successful engineering career involves troubleshooting and fixing at some time. Perhaps mainly remedying your colleague’s mistakes? The trick, I believe, is to keep your mind completely open when tackling the problem - to avoid pre-conceived ideas, as these can throw you off track.

The suggested steps for general engineering troubleshooting are as follows:

1. Identify the exact issue
When someone reports a problem to you; you can bet your bottom dollar this may not be the actual problem. When seen through the eyes of a user the report of the situation may not reflect engineering reality. Ensure you get a careful explanation and if possible a demonstration of the problem. It is your job to ascertain what the real problem is in real engineering terms. Often a problem presents intermittently. Don’t walk away from it, however, presuming it has gone forever – it hasn’t.

Recently, when trying to tune a process control loop, which the operators had complained was sluggish, I unwittingly found that I was actually dealing with high frequency signals (an aliasing problem) - it wasn’t a tuning problem, after all, but a filtering one.

2. Reproduce the problem
It is best to reproduce the problem where possible. You can then observe the full sequence of events, view the error messages and analyse other variables that may be affecting it.

3. Localise, isolate and home in
Now you have to zone in on the equipment or software module that is responsible for the problem. The trick is to zone in on the precise element causing the problem. Penetrate the thicket of equipment and find the precise element. Remember that seemingly unrelated elements can cause problems. It is also vitally important to identify exactly what happened before the problem occurred - was a card changed out and the IP address not updated on the server?

4. Make a Plan
Ensure that you assess what is required carefully. As one of my regular correspondents remarked: Beware the Law of unexpected consequences. The process of fixing something may cause other unexpected problems (a colleague of mine located and remedied severe harmonic problems in a plant network, but blew up three of my precious variable speed drives with overvoltage). When going through your plan, step-by-step, to best remedy the problem, you may find other issues appear that you hadn’t considered.

5. Trace your steps
Ensure that when you fix the problem, you know exactly what you have done in case you need to retrace your steps later to put the equipment back into its original state.

6. Test and retest
Test and retest over a period of time before accepting that the problem has been fixed. If there is any doubt about whether the problem has been fixed or not, there is no doubt. It is, most probably, still a problem.

7. Document for an absolute moron
People who come after you may not be aware of what you have done and how you have solved the problem. The problem may reappear or something similar may happen to another piece of equipment. So - document for someone who may have no knowledge of what you have done.

8. Communicate with the client or user
Often the user is not convinced the problem has been fixed. Your job is to ensure you communicate honestly; what you have done and why the problem has been fixed. Don’t treat the user as a complete idiot, but as a real partner in operating your facility. This is important for your credibility (and for the engineering profession).

I like Anthony J. D'Angelo’s take on fixing things: ‘Become a fixer, not just a fixture’.

Yours in engineering learning

Steve