How to Add Error Checking to Your Shell Scripts

Working out why a shell script is failing or not working as expected is frustrating. This article shows how to add error checking to highlight problems you might otherwise miss.

Access the shell on Mac via the Terminal.app

Unlike more recently designed languages, shell script does not have an easy answer for error handling. There are no common exception handling routines or ways of wrapping up large blocks of script and asking for errors to fall through to a provided subroutine.

Instead shell script asks you, the author, to check individual program exit codes and branch as needed in case of an error. In practice, this means your once short script is going to get a little longer and a little more involved.

Unix Exit Status

Each program you run through a shell script returns an exit status. This numeric status value tells the calling script if the program completed successfully or if an error was encountered.

The exit status is not visible on the command line. This makes it difficult to tell if something went wrong just by looking at the textual output of a shell script. It is possible - even common - for scripts to print nothing and yet encounter multiple errors.

Let’s consider this simple script:

#!/bin/sh

cp /Volumes/Documents/criticalfile.txt /Volumes/BackUp/.

This script does one thing; it copies a single file from one volume to another using the cp program.

Thankfully the cp program is chatty and will print a message if an error is encountered. That is great for us reviewing the output visually, but for the shell running our script the error will go completely unnoticed.

This is a problem if our script goes on to do more work, or if we want the script to robustly deal with errors.

Let’s add error checking to this simple script.

#!/bin/sh

cp /Volumes/Documents/criticalfile.txt /Volumes/BackUp/.
if [ "$?" != "0" ]; then
    echo "[Error] copy failed!" 1>&2
    exit 1
fi

We have added an if/fi block below the cp line. The new block checks the special variable $? to see if it equals 0 or not.

Unix programs should return 0 if they completed successfully. Any other value means something went wrong. The exact meaning of the returned value is frequently documented in the program’s man page.

If an error is detected in our script’s if/fi block, then a message is printed and the script immediately exits also reporting an error.

Why report another error? If our script does not explicitly say exit 1 then the script is assumed to have completed successfully. The error from cp does not matter unless we explicitly make it matter by passing it to our script’s caller.

William Shotts, Jr’s article Errors and Signals and Traps (Oh My!) is a good next step for learning more about how to approach error handling in shell scripts.