The Friendly Coder

On software development and technology

BASH error handling

I have been spending a lot more time lately working in Linux environments, which has afforded me with an opportunity to learn more about the Bourne Again Shell and it’s scripting environment. One thing I found somewhat tedious and difficult to figure out was how to effectively do BASH error handling in shell scripts. I’ve summarized several gotchas I encountered as well as several patterns that seem to work quite well.

In the beginning there was ‘exit’

For most basic shell scripts, you will probably just want to use the exit built in shell command. This works well enough in most basic shell scripts, those that you typically just run directly from the command line and expect them to carry out a specific operation or function in isolation then return control back to the shell.

#!/bin/bash
function MyFunction {
    echo "Doing something special"
    if [ "$error" == "true" ]; then exit 1
}

MyFunction
echo "Do more stuff"
if [ "$error" == "true" ]; then exit 2

In these cases, if the script reaches the end of the file without encountering an exit command, it will simply return controll to the caller and set the return code to 0. It even works well when called from within functions as shown above, effectively unravelling your call stack and terminating the shell script.

So, if exit works so well, what else do we need to discuss? As you might have guessed, there are cases when exit doesn’t quite work as you might expect.

sourced scripts

The first problem you may come accross relates to sourcing your scripts. See, when you just execute your script from the command line using ./MyScript.sh the script is run in its own sub-process, meaning that any changes made to the environment from within the script don’t affect the calling environment (unless they are explicitly exported of course, but I’ll leave that discussion for another day).

Sometimees you want to write a script whos purpose is to configure the current shell environment in some way, setting up environment variables, customizing shell options, etc. In these cases it’s much easier to source your script using the source built in command or it’s alias the dot character ‘.’, as in source ./MyScript.sh. Doing so causes the script to run in the current shell process, applying all changes to the current environment.

One side effect to sourcing your script as opposed to executing it in a sub-process is that if your script executes an exit command, it will inadvertently close the host shell session. An example will help illustrate the behavior:

#!/bin/bash
exit 1

Compare the behavior of executing this script as ./MyScript.sh and source MyScript.sh. The former will just return control back to the shell you launched it from where as the latter closes the shell you launched it from. Don’t believe me? Try it for yourself.

It turns out there is a simple fix for this. When a script is sourced you can use the return command to return control back to the caller much the same way we do with functions. However, the problem is we can’t use return when running a script in a sub-process. The key to a robust solution lies in finding a way to detect whether our script is running as a sub-process or if it’s being sourced. The easiest way I have found to check this is to use the not-so-obvious shell array BASH_SOURCE, something like this:

#!/bin/bash
if [ "$0" == "${BASH_SOURCE[0]}" ]; then exit 1
return 1

See, $0 will typically be set to the name of the currently running script, except in the case when the script is being sourced. In this case $0 will be set to the name of the shell executable, “bash” in this case. Conversely, the first element in the BASE_SOURCE array will always be set to the name of the currently running script regardless of how it was executed. So if these two values are the same we know we’re running under a sub-shell and it is safe to use exit, otherwise we’re being sourced and we must use return.

So now we have a perfect solution… or do we?

Errors in functions

So time passes and your scripts evolve and grow and you start making use of functions. So what happens when an error occurs in a function? Well, if you were using functions in your simple scripts like my first example above, you probably just used exit as you would anyplace else in your script and be done with it since it effectively unrolls the entire call stack automatically. But unfortunately, using exit in a function has the same effect as using it from your main script code, terminating the parent process which is the active shell session when sourced.

To make matters worse you can’t easily use return to abort your scripts because return will simply exit the function and return control to the calling code, which will be another function or the main script code.

In my travels I’ve come across several forums that suggest using the kill built in command to abort a script from within a deep call stack, passing it the process ID of the currently running script something like:

#!/bin/bash
MyFunction() {
    if [ "$error" == "true" ]; then kill -INT $$
}
MyFunction

If you try running this in even moderately complex scripts you will find that it does in fact work. However, there is yet another subtle edge case to be aware of with this suggestion. If this script is, in turn, sourced by another script in a cascading style, kill will unravel the calling thread too. Consider the following:

#!/bin/bash
source ./script1.sh
echo "Still working"

If you attempt to run ./script2.sh you will see that the echo statement is never executed because when script1.sh is sourced, and the kill statement gets executed, both scripts get terminated because they share the same process ID. This is not likely what you’d want to happen in this case.

So, how do we robustly handle errors within functions? Answer: with simple return codes.

#!/bin/bash
MyFn() {
    return 1
}

MyFn
if [ $? -ne 0 ]; then exit $?

This has the benefit of working robustly regardless of the calling context, and keeps your error handling consistent throughout your scripts whether calling a function or a command line tool by examing the $? return code. The downside is that you end up having to define return codes and checking their values throughout your scripts, which can be very tedious to retroactively add for any existing scripts you may have – but such is life.

A quick note about ‘test’

Having error codes propagated from function to function can be tedious in the best of situations. One thing that helps minimize the verbosity of this is to use the test built in shell commmand, something like:

#!/bin/bash
MyFunction() {
    echo "Doing something good"
    test $? -ne 0 && return 1
}

However, while making use of the test utility, combining it with the use of returns in functions, I discovered a few subtle, hard to notice gotchas you may not expect. Consider the case when you want to display an error message and return an error code, like some of the examples below:

# Gotcha #1
test $? -ne 0 && (echo "My error message here";return $?)

#Gotcha #2
test $? -ne 0 && echo "My error message here"; return $?

test $? -ne 0 && echo "My error message here"
return $?

#Gotcha #3
test $? -ne 0 && { echo "My error message here";return $?; }

Gotcha #1 looks harmless at first glance, but unfortunately does not work correctly. When grouping commands using round brackets you are telling BASH to spawn a new sub-shell to execute the commands within. When this happens the return will exit the sub-shell only and control returns to the line following the test statement.

Once you discover this fact you may try to just remove the round brackets so the commands execute in the current context, something like gotcha #2. To understand what the problem is here you first need to know that the && and || operators in this context are not logical operators but rather list operators which allow chaining of single commands together to run in sequence so long as the preceeding command returns a zero or non-zero value, respectively. Similarly, semicolon control character (not to be confused with the semicolon metacharacter) also separates one statement from the next, but it does so regardless of the return code of the preceeding statement. Confusing syntax aside, the real difficulty lies in the realization that these operators chain single commands together. So basically the one statement on line 5 is functionally equivalent to the two statements on lines 7 and 8, which is almost certianly not the behavior you want.

So, if you are the dilligent script-kiddie I know you are you probably hit Google again and continue your search and you’ll come across another grouping operator, the squiggly brackets. As described, they serve the same purpose as the round brackets but have the added benefit of running in the current shell context rather than a sub-shell. Which brings us to gotcha #3. So what’s the problem here? I’m glad you asked!

Gotcha #3 will successfully catch errors, display the relevant error message, and return from the script, however it will not return the correct error code. This is because the $? special variable contains the status code from the last executed statement. In this example the return code that was set by the previous operation will be overwritten first by the test command and then again by the echo command.

So, with all of these not-so-functional solutions, are there any patterns that will serve our purposes? I believe so:

#Solution #1
# some operation
result=$?
test $result -ne 0 && { echo "Error running project 1; return $result; }
# NOTE: The trailing semicolon is mandatory -------------------------^
# NOTE: spacing around curley brackets is mandatory -------------------^


#Solution #2
# some operation
test $? -ne 0 && { echo "My error message here";return 1; }

If it is important to preserve an error code, solution #1 should do nicely. We cache the error code from our operation in a temporary variable for later reference within our test condition. Be aware that you must perform this assignment before doing any other operation including assigning values to other variables, making calls to echo or other built in commands. Even certain flow control statements and operators may inadvertently reset this value. If it’s not important to preserve the actual error code, you can further simplify the code as shown in solution #2, returning a static error code instead of caching the original value.

Summary

Error handling in simple shell scripts which are only ever run from the command line, it’s probably easiest to just use exit statements in most situations. However, to make sure your scripts are easily scalable and support different execution contexts there appears to be a few good patterns that should save you some time and headache, both now and in the future. Here is an example script that summarizes these patterns:

#!/bin/bash
MyFn1() {
   return 123
}

MyFn2() {
    MyFn1
    result=$?
    test $result -ne 0 && { echo "Error running project 1"; return $result; }

    # some other operation
    test $? -ne 0 && { echo "Error performing operation"; return 1; }
}

#mainline
MyFn2
result=$?
test "$0" == "${BASH_SOURCE[0]}" && exit $result || return $result

In short, for the best error handling results I’ve found the following rules of thumb to be most effective:

  • make sure you never use exit in your functions, always use return to report errors. It can be slightly tedious for smaller scripts but it could save you time and headache down the road.
  • use curley braces when performing operations resulting from test.
  • those areas that must manage reporting error conditions back to the caller should look to the BASH_SOURCE proprety to see if exit or return should be used.
  • if you aren’t going to make use of an operation’s return code immediately after performing the operation you should store it in a new variable for later reference, even if you think it’s safe not to.

Leave a Reply