Black Box Black Box Testing
One of my friends is in College, and is currently feeling the full idiocy of a system that was only beginning to be rolled out as I left. Let me explain how it works.
Essentially, the system is meant to test the students’ solutions to homework problems. This is done by providing a solid definition of what the input and output of the application are supposed to be on the standard in/out channels, and setting up a whole bunch of test cases, including a memory limit and a CPU time limit. Students submit their source code to the system, which compiles it and runs all the test cases against the application in a black box test. So far, so good.
Seeing these guys at work, compared to me and my colleagues at work, makes a few things very apparent: even with a fairly solid grasp of algorithms and datastructure, their number one problem is code. Where professional programmers swim through code like sharks in the sea, the students appear to be more or less drowning. Theoretical learning aside, the education lacks practical programming, debugging, practical programming and some more practical programming.
It would seem that these programming exercises would be the perfect opportunity to get that kind of experience, if it wasn’t for the fact that the test system is itself a black box. You put in your code, and it tells you yes or no. It’s not quite a boolean pass/fail answer, but close enough: you will get told a result from the set: Didn’t compile, Passed, Failed, Crashed, Time Limit Exceeded. When I first heard of the system, it was motivated with the fact that sometimes in professional programming, that’s all you get.
I agree. Sometimes, you get gnarly bugs that give you less information than a world pro’s poker face. I’ve spent weeks tracking bugs like that sometimes, using all kinds of tools at my disposal to try to wring more information out of the error, until finally the knot was untied. But — for all the bugs like that I’ve been through, none of them were eventually solved by guessing what was wrong and how to fix it.
Supposedly, the tool is meant to teach the students to debug their code… which it somehow does by disallowing all normal debugging tools. You can’t run a debugger on it, you can’t print traces, you’re not allowed to log to a file or socket, you’re not even allowed to know what input caused the error. The only tools you have at your disposal are your wit in coming up with your own test cases and code reviews.
Any attempts at normal debugging would be classified as cheating. If I was faced with a bug under those circumstances, I would do whatever I could to get more information out of it. Hey, I can crash it with different signals — that’s a few bits of information I could get back from it. All those kinds of tricks of the trade that real programmers use to, you know, solve problems… would be cheating.
This leads to a skewing of results… very simple bugs turn into monster problems, since you can’t identify and fix them. What they are learning is not how to debug their programs but how to painstakingly solve the very specific problem of pleasing the system. By artificially making easy things hard, the system has effectively found a way to avoid teaching the students essential skills in programming: simple debugging tools like tracing and breaking into a debugger. Instead, they learn programming by coincidence: poke something until you (hopefully, eventually) get a green light.
That’s not a lesson to learn.
The only way to go about this, faced by the obstacle made up of this system, is to learn a different skill: testing. More on that later.
More on studies: An Exceptionally Stupid Idea, Go Tinker, What’s a Good Final Year Project?
8 Comments
Other Links to this Post
RSS feed for comments on this post. TrackBack URI
By Paul, Thursday, September 17, 2009 @ 18:35
hmm. I’m torn on this. I’d say 98% of the time I’m forced to use the poke system. I’m used to it and can get what I want about 50% of the time and the other time I have to really really break it and work backwards. That being said, in order to do this black box programming right, you’d have to start with writing your own debugging script and then write output to it until you get to the final product and switch it from a debug/comment system to a final product system. That’s not really that bad an idea.
I do hope they can at least use try-catch otherwise I’d have to agree with the writer. Hopeless!
By Winsrp, Thursday, September 17, 2009 @ 19:20
well, as a programmer myself, it just feels stupid. Debugging and tracking is the most useful thing when dealing with errors/bugs, most of the people that work with me, are freshmen right out of the college, and they try to make things work without event test it, and if they find an error they get frustrated, I’m trying really hard to teach them that programing is a “by-parts” job, when you have to do and test 1 thing at the time, so everything will work like a clock at the end. And working on their debugging skills since most if not all of the available tricks are pretty much new to them.
By slicedlime, Thursday, September 17, 2009 @ 20:00
Paul: I agree in part that it’s a good way to test things thoroughly… thus my comment at the end. However, part of the issue on these problems is that the available test data is very limited… usually one or two example inputs. So, you’re left with your own wit coming up with new test data (and making sure you get it right), or just reviewing code.
I agree that in a real-world situation you’ll want to test everything in pieces before shipping it, but we’re not talking about a software system here… it’s more or less one function per assignment.
There are good ways of working to systematically work with this situation too, but really that’s more something for a course in unit testing than anything else. It’s a skill many professional programmers are missing too.
But more on that in a future post.
By Brian Guthrie, Thursday, September 17, 2009 @ 20:19
“So, you’re left with your own wit coming up with new test data (and making sure you get it right), or just reviewing code.”
Isn’t that sort of the point? How does that discourage the development of world-class programming skills? Unit testing is a complex topic worthy of study, but it doesn’t belong in its own course; every programmer should be doing it. If it’s one function per assignment, part of being a world-class programmer is understanding what it takes to test the dickens out of that function.
By slicedlime, Thursday, September 17, 2009 @ 20:27
If you’re saying you want to teach people to debug programs, then you should let them practice… eh… debugging programs. If you want to teach them unit testing, you really need to be teaching them that, not just setting up assignments in a way that forces them to learn it on their own.
There are good techniques for it, but they don’t teach it, and you sure can’t expect people to figure those techniques out by themselves.
The situation here is more similar to what happens when you’ve done your unit testing and move on to integration testing… but then you’re unable to use debugging tools. There is no good point for that.
By Jenn, Thursday, September 17, 2009 @ 20:29
This is the system that was used when I was in college, the “autograder”. It wasn’t as bad as this article makes it sound.
First of all, it’s not about “pleasing the system”; the system is pleased by correct code. You do get one or two test inputs/outputs to work with, and then there are usually around 20 black box test cases which include the tests you were given originally. It should be trivial, therefore, to pass a couple of tests, and usually if you can pass a couple you’ll pass most. Say you pass 18/20, then you know you’re missing a border condition. And even if that’s the best you can do, you get partial credit.
Also, at my college they used the autograder to check the code functionality and then a TA would actually read through your code and possibly give you more credit. They tried to give you as much room to succeed as possible without handing out their test cases. Speaking of TA’s, most of them were more than willing to at least give you hints as to what test cases you might be missing. Most would even take a look at your code. It’s not like you’re totally on your own, pitted against some inscrutable machine.
My main problem with the system is that you had a very limited number of submissions (like 5). Sometimes you’d fix something that was a red herring and waste a submission. That can be stressful.
By slicedlime, Thursday, September 17, 2009 @ 20:34
You’re talking about a different system Jenn. You pass 18/20 tests, you don’t know that… all you know is “Fail”. No information on what you passed or not, no partial credit, nothing.
The course I saw my friend was doing gave them 1-2 test cases, then had 100+ cases in the system test. If you failed, you had no idea if it was the first or the last test that failed.
Your autograder system sounds like a step up from KATTIS.
By Rob, Thursday, October 22, 2009 @ 16:17
Where are all the new blog posts!?