Handling Errors

bsod-1One of the trickiest subjects in programming is the proper handling of errors. What do you do when things go wrong? Some errors are predictable so you can plan for them occurring. Some errors are predictable, but you still wont plan for their occurrence, and a third category of horrible situations are circumstances you could never have guessed would occur.

In order to properly manage errors, you first need to identify what kind of errors you’re dealing with. The best software deals with all errors, but deals with different kinds of errors in different ways. I divide errors into four different categories:

  • Full crashes.
  • Programming errors.
  • Exceptional circumstances.
  • User errors.

Depending on where in the list you are (the worst kind of error is the first one), you’ll want to take a different approach to handing the error.

Crashes

The outright crash is the worst kind of programming error you’ll find. The code is malfunctioning in such a bad way that uninitialized memory is accessed, memory is overwritten, null pointers are dereferenced, unaligned memory is read or something similar.

In native code, crashes normally cause messages like General Protection Fault, Segmentation Fault (segfault) or Bus Error (misaligned memory access). In bytecode-compiled languages (managed code), crashes are usually raised as exceptions. For all languages (worth mentioning) it’s possible to handle outright crashes. In managed languages like Java or C#, you can catch the exception and do something with it (regardless of how significant). In C or C++, you can install error handling code for these occurances.

Regardless of how your language deals with crashes, you should be treating them as separate issues from other exceptions. In general, there are two things to consider when dealing with crashes:

  1. Crash early. This is one of the vital tips from the book The Pragmatic Programmer by Andrew Hunt and David Thomas. If you crash as soon as possible when there’s an error, you avoid running with trashed data, trying to salvage the situation. The probable result of trying to recover from a crash is that you’ll run with data that is broken, and save that data somewhere — maybe you’ll overwrite the user’s settings with gibberish, or trash data in a vital database.
  2. Don’t crash. This may seem a bit contradictory to the above rule, but in essence this is about what you expose to your user. Jeff Atwood calls it crashing responsibly, but in my world it’s about not putting the user through the experience of a crash — you reduce the crash to a normal application failure (which is better) by showing the user your own explanatory text, preferably with an apology and some way for them to know that you’re working on fixing that crash (you are, aren’t you?).This means you’ve got to automatically report and track all crashes. Don’t leave it up to your users to report crashes to you — they most probably wont, since they’ll be too busy either getting your application restarted so they can finish what they were doing, or looking at your competitor to find an application that doesn’t crash.

Programming Errors

A programming error is an error caused by the code failing to abide by the rules set forth by other parts of the code. Violating contracts, failing to follow the documented restrictions of an interface or similar. Normally, you’ll use asserts to catch programming errors. In languages that don’t have asserts, you’ll cry for a while, spend a few minutes contemplating switching to a better language, and then probably do the check and throw an exception.

There are a few common mistakes with regards to asserts and programming errors:

  • Using asserts for other things that programming errors — asserts should be used only to check things in called code that the calling code could have and should have checked before making the call. You can think of an assertion as a statement of something that should never happen.
  • Allowing asserts to be ignored — Assertion failures should be treated just like crashes when it comes to handling. An assertion is an unconditional error in the code, something that should be fixed immediately, and if you ended up getting an assertion failure you have lost track of the well being of the system. Crash, automatically report, and fix the problem for your next release.This is again good advice from The Pragmatic Programmer. Switching assertions off when you ship an application indicates that you think you’ve fixed all bugs. This is a rather naïve attitude, and you’ll quickly learn it doesn’t hold true. The only difference between debug and release might be how you handle your assertion failures.Assertions make it easier for you to find and fix the errors than the crash you might otherwise get, even after you’ve shipped.
  • Switching asserts off for release — Asserts are nearly always switched off for release builds. The built in assertion mechanism of C and C++ does this unconditionally, but building your own assert is not as hard as junior (and even some senior) programmers tend to think it is.Sometimes, you may need to switch some assertions off for release, when performance concerns are addressed. This should be a conscious, well considered decision on specific asserts however, not a default.

Exceptional Circumstances

Exceptions are the somewhat mangled used-for-everything error handlers of most object oriented languages. Be careful of how you use exceptions — they should only be used for exceptional circumstances. Unexpected, but detectable, problems.

Note that the one thing you should never do is exception checking. Say for instance that you’re reading user settings from a file, but the file may not exist since the user may not have started the application before. The wrong thing to do here is to try to open the file, catch any exceptions and move on. The right thing to do is to check whether the file exists before trying to open it.

Remember — exceptions are supposed to be used when something unexpected happens — if you already know the file may not exist, it’s not unexpected that it doesn’t. However, if when saving the settings file it won’t open because it’s write-protected, that’s a good place for an exception.

To summarize this as a simple rule: “Never use exceptions as a control structure”. There’s several reasons for this, but with exceptions representing something gone wrong, it should be reasonably easy to understand that things should not continuously go wrong during normal execution. Practically, a program that only throws exceptions when things go wrong is much easier to debug than one that throws exceptions here and there.

Another thing to think about with exceptions is that in general, they are specific to the context in which they’re thrown. For instance, a FileAccessException makes sense when the file can’t be opened in the above example, but as little as one or two steps up the call stack, a FileAccessException makes no sense at all. Usually it’s a good idea to catch the exceptions and convert them to a type that makes sense in the current context. This makes it easier to decide where it’s appropriate to handle them.

User Errors

The final category of errors is user errors. These are things that aren’t even (or shouldn’t be) unexpected, and certainly not exceptional. Always assume that your user will input something wrong in any kind of input form. The file name you asked for wasn’t a valid file name, you asked for a number but got the string “ten” — your imagination will not be capable of coming up with all the wickedly “stupid” ways your users will try to use your application (stupid in this context is a programmer-view of the world — to programmers many of the natural ways people communicate seem stupid when applied to computers, but the stupidity is generally on the side of the computer, not of the user).

A user error should never manifest itself as any of the above kinds of errors — you should always be checking and validating user input before letting it propagate into the system. Failing to do so will likely cause unexpected and weird behaviour from your application — which in turn is a programming error, not a user error. The fault was yours for not validating the input, not the user’s for trying to use your application.

Error Handling Code

Big GPF Error Message

Once you start working on properly handling errors, you’ll inevitably start producing lots of code. This code will be run very rarely, which means it’ll likely be less well tested than the rest of  your code base. You only get occational shots at fixing this code (when something else is broken), so fix error handling code first.

Another thing to note about error handlers is that you’re usually limited in what you do in them. Depending on the kind of error, you may have no possibility of allocating memory to deal with your error message (although applications that gracefully manage out of memory errors are a truly rare find).

I recently fixed a problem which caused a hard crash of our data building pipeline, which was getting stuck in an infinite loop tryng to build a malformed shader. There are several steps of things to fix here:

  • Fix the hard crash – why wasn’t our error handling gracefully exiting the build? It turned out that our error handler attempted to read the callstack and print it. This is a good idea on a normal crash, since it could then be reported to the programmers — but it’s a very bad idea on a stack overflow. Not only is the stack extremely large and unlikely to yield much information to the programmers, but there’s no stack space left to deal with the stack overflow.The hard crash was the application’s error handler entering a function with a stack-allocated string to manage the callstack lines. Changing the handler to not try to list the callstack fixed the crash.
  • Fix the error of the programmer error — the pipeline ending up in an infinite loop due to bad user data. We added a validation to ensure that the input graph was properly acyclic. Note that this is something you should do after you’ve fixed your stack overflow error handler — otherwise you can’t know if your fixes for the handler worked.
  • Fix the user data – actually fix the faulty shader graph. As a bonus we added error checking code to our editor to prevent the error from ever occurring again.

Errors can teach you much about the health of your code base if you listen to them. What have your errors taught you?

(Thanks to cdamian and Justin Marty on flickr for the images)

Sunday Link Run

Time for a new blog feature… I’ll post some interesting links that I’ve come by during the week.

  • Microsoft presents a new programming language, Axum, built with a focus on paralell programming and message passing between agents. Looks interesting.
  • Andrei Alexandrescu’s slides from boostcon are online: Iterators must go.
  • The news of the week in the games world: 3d realms is shutting down. Duke Nukem Forever looks to be forever a dream. Also shows you why deadlines are good.
  • DICE is hiring a front-end and back-end web developer for a new product, causing a bit of a stir. If you’re a web developer wanting to play in the games industry, go for it!
  • The EA3 press event happened during the week. EA Forum members are putting their full creativity into getting a better look at BF: Bad Company 2.
  • And while some of these actually do make sense, 13 things that don’t make sense is an interesting read.

Design Fundamentals: Abstraction

Design Fundamentals is my series on code design aimed at people who may have done some code in school, but haven’t done much code design, or who haven’t read much about design before.  This is the second article of the series.

Code design is all about making sense of a system. Any software worth mentioning quickly grows larger than your brain can easily track. A good design will ensure that each piece of the program can be safely worked on in isolation — that whatever you need to do, it’ll fit within your working memory.

Abstraction is a key component to keeping the amount of details you need to remember at once at a minimum. By abstracting a piece of code, you remove the details and only have to think about the abstract interface to the code, which should be smaller (or the abstraction was a bad idea).

So exactly how does abstraction work? I’m sure you’ve done some version of it before, maybe even without being aware of it. At a low level of abstraction, the code is filled with all the details. At a high level of abstraction, code uses concepts that hide details. So in abstracting code, we lift out the details to some other place — maybe a class or a function.

When to abstract

While you can debate any kind of rules for when to abstract (just like most things in code), I’ll attempt to give you some guidelines. Adapt them as they suit you and change them if they don’t make sense in your context.

For functions, what you can keep in your head is strongly related to what you can see on your screen. As soon as a function has you scrolling up and down, you’ve got a good sign that the function is too long and that you should abstract something away. For me, the maximum length of a function that is normally reasonable is about 50 lines — the exact number will vary with personal preference and the content of the function.

For both functions and classes, the relationship with other classes or functions also matters — specifically the fan-out (amount of other classes your class uses, for instance). You want to keep the fan-out low, since otherwise you risk forgetting the details of something while working on the code (or confusing the next coder who works with it). Steve McConnell has this to say about it in Code Complete (a recommended book if you’re interested in code and design):

Low-to-medium fan-out Low-to-medium fan-out means having a given class use a low-to-medium number of other classes. High fan-out (more than about seven) indicates that a class uses a large number of other classes and may therefore be overly complex. Researchers have found that the principle of low fan-out  us beneficial whether you’re considering the number of routines called from within a routine or from within a class.

The same thing applies with the number seven as with the above line count — use your judgement, but keep in mind when the fan-out is starting to increase that you might want to look at your design again.

Common abstraction mistakes

The number one mistake people do when it comes to abstraction is to confuse the need to abstract with the interface of classes that is exposed to others. Often, when abstracting a part of a function to a new function, it doesn’t need to go into the interface of the class you’re working with at all — it can simply be placed in an anonymous namespace at the top of the file you’re working on.

Another mistake that is fairly common is to treat abstraction as a simple cut and paste activity, where a number of lines of code are simply moved to another place. Stop to consider where to make the cut — what part of the code you’re working on is a logical unit that does something reasonable? If you end  up with a function named partOfMyOtherFunction() or something similar, you haven’t really achieved anything.

Keep thinking about your abstracted functions — do they share characteristics? Often you might find yourself creating helper functions that would fit better grouped together in a helper class. This class can also reside in the same file — there’s nothing which forces you to create a new header file and source file in order to abstract code.

Building abstract code

If possible, build your code in abstract layers from the start. Consider what parts you might need when implementing your functionality, and make sure implementation details are well partitioned off. Especially things like system differences and interfaces towards the operating system is good targets for an abstraction layer.

Encapsulate the system functionality in order to not have to bother with details everywhere in the code. This is especially useful when interfacing with badly designed APIs, to ensure that this doesn’t spread to all of your code. This lets the rest of your code think about the system functionality at a higher level, without the details.

Stay With the Flow

A lot is written in various places about programmer flow. It’s been called a few things: finding the flow, being in the zone, coding mode. Usually it’s mentioned when talking about programmer productivity in general, as in “get the programmers private offices, breaking their flow causes major productivity drops”, or sometimes some coder reflects “I was really in the zone yesterday when I wrote this”.

Not much is written about what the flow actually is though. Often it’s viewed as something near-magical, since even if you’re left undisturbed it can be incredibly hard to find the flow — but if you do, the results can be magical indeed.

But is it really magic? Let’s start by defining what it actually is:

Flow is the mental state of operation in which the person is fully immersed in what he or she is doing by a feeling of energized focus, full involvement, and success in the process of the activity.

To most people, focusing on something easy — sit down undisturbed and think very hard about that thing and only that thing. However, flow is something more — the feeling of energized focus and full involvement is a part of it, but the success factor is more important. You won’t feel flow unless you’re continually moving forwards.

Once we’ve understood that continual motion is a key to maintaining (and entering) flow, we can start looking at practical tips for getting into and staying with the flow. I’m not saying that these rules are going to solve all your problems, but they can get you quite a few steps on the way towards diving in with the flow.

Environments

Environments matter, but much less so than you’d think. Different people break out of flow differently and find varying difficulty in getting back into it, so find your own level here:

  • Isolate yourself – it’s impossible to find the flow when continually talking to others or hearing conversations. Even if you’re sitting near other people, put your headphones on and put some music on.
  • Get your equipment sorted — make sure you don’t run into any difficulties with the equipment, and that there’s nothing with it that annoys you. Small nagging annoyances can easily distract your full focus. Make sure your keyboard doesn’t glitch, make sure the editors and tools you use work smoothly, make sure you have good headphones that shut out the world around you, make sure there’s not going to be sunlight shining directly on your monitors in half an hour.
  • Communicate your flow — tell people about it. When you isolate yourself it can feel hostile towards others, but trust me they’ll understand. Just tell them you’re going to be putting your headphones on because you need to focus on something.

When in doubt…

Flow breaks when you get stuck with a problem. Not having flow, it’s easy to get stuck on just about anything. These two factors are what stops many people from getting into flow, and since they’re interconnected it’s hard to break out of that loop — you’re stuck with something that’d be easy if you had flow because you don’t have it.

The good news is that this works the other way around as well. If you can manage to not get stuck and to always keep moving forwards, you’ll be in the flow before you know it.

  • When in doubt, compile — often, while in the middle of writing something, you’ll run into some problem or other and get stuck. If you find yourself snagged on something for more than a few seconds, start the build. This may seem entirely unrelated to what you’re stuck on, but fact is you’re likely to have compile errors somewhere in the code you’re writing, so why not fix them now that your mind is busy figuring the problem out.This tip has two good properties. One is that you may gain insight on the problem you’re stuck on just by fixing the compile errors in the surrounding code. The second is that you keep moving, and thus you never get the same feeling of stopping what you’re doing, which can break flow.
  • When in doubt, type — it’s easy to get stuck on a design problem and let that stop you from coding. However, there’s nearly always some typing to do — create a blank file for the class you’re thinking about. Fill in the file with all your template stuff that needs to be there (header guards for a c/c++ project, for instance).Now keep going — create a blank class. Type in a method signature (doesn’t matter if it’s not 100% correct) for a method you think you might need. Type in another. Create void implementations of the methods — maybe you’ll realize that you need something more, so add that in. Change things around. As you keep doing these small, seemingly mechanical tasks, you help your mind along the way by providing a framework for what you’re designing.

    Again, this has the dual point of moving forwards and helping your brain activity along. Sometimes, typing something into your editor and then immediately going “no, that’s not right” can save an hour of thinking (or even more, if you lost the flow and your mind wandered).

  • tangleWhen in doubt, find the loose end — sometimes you’re faced with a large amount of entangled code or design and you don’t know where to start. This tends to happen when starting new sections or subsystems, where the amount of work to do is large enough to cause you to not be able to grasp the entire thing up front. Maybe you don’t see the design for the system clearly, so you stop to ponder about it, losing flow.The best thing to do in this situation is to dive into the code. Find one strand and start untangling it. There’s bound to be some part of the code where there’s a small piece you know you’ll need. Start there. When you’re done with that, not only will the amount of remaining work be smaller, you may have found new insights into the surrounding code tangle, making it easier to keep untangling.

The overall theme is quite obvious: as long as you stay isolated from the distractions of the world and keep hammering at your code, even if it’s just doing mechanical stuff, you’ve got a large chance of finding the flow.

This makes it a lot less magical, and a much easier tool to bring out. Does that mean you should always do it? No, there are certainly points where the best thing to do is to stop and make sure you’ve really thought things through properly. That’s a choice you should make actively, not one left to magical chance.

Before you know it, you’ll be the magician.

WordPress Themes