Handling Errors
One of the trickiest subjects in programming is the proper handling of errors. What do you do when things go wrong? Some errors are predictable so you can plan for them occurring. Some errors are predictable, but you still wont plan for their occurrence, and a third category of horrible situations are circumstances you could never have guessed would occur.
In order to properly manage errors, you first need to identify what kind of errors you’re dealing with. The best software deals with all errors, but deals with different kinds of errors in different ways. I divide errors into four different categories:
- Full crashes.
- Programming errors.
- Exceptional circumstances.
- User errors.
Depending on where in the list you are (the worst kind of error is the first one), you’ll want to take a different approach to handing the error.
Crashes
The outright crash is the worst kind of programming error you’ll find. The code is malfunctioning in such a bad way that uninitialized memory is accessed, memory is overwritten, null pointers are dereferenced, unaligned memory is read or something similar.
In native code, crashes normally cause messages like General Protection Fault, Segmentation Fault (segfault) or Bus Error (misaligned memory access). In bytecode-compiled languages (managed code), crashes are usually raised as exceptions. For all languages (worth mentioning) it’s possible to handle outright crashes. In managed languages like Java or C#, you can catch the exception and do something with it (regardless of how significant). In C or C++, you can install error handling code for these occurances.
Regardless of how your language deals with crashes, you should be treating them as separate issues from other exceptions. In general, there are two things to consider when dealing with crashes:
- Crash early. This is one of the vital tips from the book The Pragmatic Programmer
by Andrew Hunt and David Thomas. If you crash as soon as possible when there’s an error, you avoid running with trashed data, trying to salvage the situation. The probable result of trying to recover from a crash is that you’ll run with data that is broken, and save that data somewhere — maybe you’ll overwrite the user’s settings with gibberish, or trash data in a vital database.
- Don’t crash. This may seem a bit contradictory to the above rule, but in essence this is about what you expose to your user. Jeff Atwood calls it crashing responsibly, but in my world it’s about not putting the user through the experience of a crash — you reduce the crash to a normal application failure (which is better) by showing the user your own explanatory text, preferably with an apology and some way for them to know that you’re working on fixing that crash (you are, aren’t you?).This means you’ve got to automatically report and track all crashes. Don’t leave it up to your users to report crashes to you — they most probably wont, since they’ll be too busy either getting your application restarted so they can finish what they were doing, or looking at your competitor to find an application that doesn’t crash.
Programming Errors
A programming error is an error caused by the code failing to abide by the rules set forth by other parts of the code. Violating contracts, failing to follow the documented restrictions of an interface or similar. Normally, you’ll use asserts to catch programming errors. In languages that don’t have asserts, you’ll cry for a while, spend a few minutes contemplating switching to a better language, and then probably do the check and throw an exception.
There are a few common mistakes with regards to asserts and programming errors:
- Using asserts for other things that programming errors — asserts should be used only to check things in called code that the calling code could have and should have checked before making the call. You can think of an assertion as a statement of something that should never happen.
- Allowing asserts to be ignored — Assertion failures should be treated just like crashes when it comes to handling. An assertion is an unconditional error in the code, something that should be fixed immediately, and if you ended up getting an assertion failure you have lost track of the well being of the system. Crash, automatically report, and fix the problem for your next release.This is again good advice from The Pragmatic Programmer. Switching assertions off when you ship an application indicates that you think you’ve fixed all bugs. This is a rather naïve attitude, and you’ll quickly learn it doesn’t hold true. The only difference between debug and release might be how you handle your assertion failures.Assertions make it easier for you to find and fix the errors than the crash you might otherwise get, even after you’ve shipped.
- Switching asserts off for release — Asserts are nearly always switched off for release builds. The built in assertion mechanism of C and C++ does this unconditionally, but building your own assert is not as hard as junior (and even some senior) programmers tend to think it is.Sometimes, you may need to switch some assertions off for release, when performance concerns are addressed. This should be a conscious, well considered decision on specific asserts however, not a default.
Exceptional Circumstances
Exceptions are the somewhat mangled used-for-everything error handlers of most object oriented languages. Be careful of how you use exceptions — they should only be used for exceptional circumstances. Unexpected, but detectable, problems.
Note that the one thing you should never do is exception checking. Say for instance that you’re reading user settings from a file, but the file may not exist since the user may not have started the application before. The wrong thing to do here is to try to open the file, catch any exceptions and move on. The right thing to do is to check whether the file exists before trying to open it.
Remember — exceptions are supposed to be used when something unexpected happens — if you already know the file may not exist, it’s not unexpected that it doesn’t. However, if when saving the settings file it won’t open because it’s write-protected, that’s a good place for an exception.
To summarize this as a simple rule: “Never use exceptions as a control structure”. There’s several reasons for this, but with exceptions representing something gone wrong, it should be reasonably easy to understand that things should not continuously go wrong during normal execution. Practically, a program that only throws exceptions when things go wrong is much easier to debug than one that throws exceptions here and there.
Another thing to think about with exceptions is that in general, they are specific to the context in which they’re thrown. For instance, a FileAccessException makes sense when the file can’t be opened in the above example, but as little as one or two steps up the call stack, a FileAccessException makes no sense at all. Usually it’s a good idea to catch the exceptions and convert them to a type that makes sense in the current context. This makes it easier to decide where it’s appropriate to handle them.
User Errors
The final category of errors is user errors. These are things that aren’t even (or shouldn’t be) unexpected, and certainly not exceptional. Always assume that your user will input something wrong in any kind of input form. The file name you asked for wasn’t a valid file name, you asked for a number but got the string “ten” — your imagination will not be capable of coming up with all the wickedly “stupid” ways your users will try to use your application (stupid in this context is a programmer-view of the world — to programmers many of the natural ways people communicate seem stupid when applied to computers, but the stupidity is generally on the side of the computer, not of the user).
A user error should never manifest itself as any of the above kinds of errors — you should always be checking and validating user input before letting it propagate into the system. Failing to do so will likely cause unexpected and weird behaviour from your application — which in turn is a programming error, not a user error. The fault was yours for not validating the input, not the user’s for trying to use your application.
Error Handling Code
Once you start working on properly handling errors, you’ll inevitably start producing lots of code. This code will be run very rarely, which means it’ll likely be less well tested than the rest of your code base. You only get occational shots at fixing this code (when something else is broken), so fix error handling code first.
Another thing to note about error handlers is that you’re usually limited in what you do in them. Depending on the kind of error, you may have no possibility of allocating memory to deal with your error message (although applications that gracefully manage out of memory errors are a truly rare find).
I recently fixed a problem which caused a hard crash of our data building pipeline, which was getting stuck in an infinite loop tryng to build a malformed shader. There are several steps of things to fix here:
- Fix the hard crash – why wasn’t our error handling gracefully exiting the build? It turned out that our error handler attempted to read the callstack and print it. This is a good idea on a normal crash, since it could then be reported to the programmers — but it’s a very bad idea on a stack overflow. Not only is the stack extremely large and unlikely to yield much information to the programmers, but there’s no stack space left to deal with the stack overflow.The hard crash was the application’s error handler entering a function with a stack-allocated string to manage the callstack lines. Changing the handler to not try to list the callstack fixed the crash.
- Fix the error of the programmer error — the pipeline ending up in an infinite loop due to bad user data. We added a validation to ensure that the input graph was properly acyclic. Note that this is something you should do after you’ve fixed your stack overflow error handler — otherwise you can’t know if your fixes for the handler worked.
- Fix the user data – actually fix the faulty shader graph. As a bonus we added error checking code to our editor to prevent the error from ever occurring again.
Errors can teach you much about the health of your code base if you listen to them. What have your errors taught you?
(Thanks to cdamian and Justin Marty on flickr for the images)
2 Comments
Other Links to this Post
-
My Photos Across the Web [A Little Narcissism] | 365 Discoveries — Wednesday, May 20, 2009 @ 10:44
RSS feed for comments on this post. TrackBack URI

By Silent, Wednesday, May 20, 2009 @ 17:25
Your articles are great. I’ve been programming for a while and I grasp most of the concepts you cover (for example this and encapsulation a while ago), but you present it in such a clear manner that it still gives some “oh, right”-moments.
Cheers!