Toward Better Bugs

Did you know there’s a difference between “failure” and “fault”?

For years, I didn’t.

When I learned the difference, it was a profound epiphany and I found it really helpful in coaching testers.

Failure — an error produced by an underlying cause.

Fault — the underlying cause.

That’s it.

The thing no one told me was, I was free to decide what the cause was. I was in control of how much I was going to investigate.

It’s reasonable to say “Earthquakes make buildings fall down.” We can fix the buildings, but if we don’t design them better, the building may fall down again. We can treat the symptom, we can live with the fault, or we can cure the disease — digging down miles into the earth and fixing the ACTUAL fault – the tectonic plates — so they don’t cause earthquakes.

Here are some examples of software failures:

– “Run-time error ‘6’: Overflow”

– Blank log file

– “This page could not be displayed”

– Rectangles displayed instead of characters

– Slow performance when launching app

If these were the titles of bugs, I would approach the tester and say, “Um, these are pretty lame. I think you have a better bug here.” Or, I can simply ask “Why do you *think* that failure is happening?”, provoking a conversation and listening to the conjectures they have. Then I’d ask them for ideas for follow-up tests they could run to either refute or corroborate their conjecture.

I was working with a tester recently. He told me some test ideas he had for a simple feature — a Browse control that enabled users to find files to upload to a secured website.

“Nothing fancy,” he said. “Pretty straightforward.”He had thought of the following tests:

* uploading a small file (1K)

* uploading a large file (1024K)

* uploading a file that is over the acceptable size limit (greater than 3MB)

* uploading no file

* uploading a file with an unsupported extension

* uploading a really long filename

* confirming that the file got uploaded

We went through his idea list and a few new ideas emerged. That’s the power of exploration — actually working with the feature to see what it does and reacting to new perspectives that come from that. But when we got to the really long filename idea (we knew 255 might be meaningful so we carefully counted the characters), we found that the Submit button cleared out the path to the file, appearing as if it was back in its original empty state. This didn’t happen with our other tests.

Ok, good. Times like this are a signal to investigate. Our work was just beginning, not ending. We had found a failure, now it was time to dig into the earth to find the fault, to go from symptom to root cause – or at least, get as close to the root cause as was reasonable in the time we gave ourselves.

Our procedure:

1) Try it again. Could have been something stupid we did and not realized it.

2) Yep, same problem.

At this point we could have stopped and written a bug: “Long filenames result in blanked path.” But is that really the problem? It may be true, but is it *entirely* true? Actually, How True Is It?

Questioning is testing, and at this time we had more testing to do. There was more context to unearth before I felt comfortable reporting this problem. This is what separates bad testers from good testers. Asking a lot of questions might be expensive (time-consuming) so I always ask myself: Is further investigation or time or money spent investigating this more harmful than helpful? If the answer is no, I keep going. If yes, I stop. Maybe that information is good enough for the developer for now.

The main thing I do is anticipate what the triage committee might ask me if I report a bug that’s too vague. I don’t want this thing coming back on me when I’ve got it trapped right here and now.

Could it be a browser setting causing this problem?

Could it be a bug on the boundary of the input limit?

Could it be a bad file?

Could it be something right in front of us we don’t see, like the “Error on this page” message in 6-point type on the browser window?

All of these questions have follow-up tests behind them.

After a little bit of digging, we realized that it wasn’t the filename length after all, it was the entire PATH length that was the problem. We realized that we had to add the length of the filename to the rest of the path to make it “fully qualified” (the part that includes C:\program files\documents and settings\username…). That first part is 48 characters right there.

To test our conjecture, we took a file with a short filename and buried it in 25 nested directories with filenames of 8 characters each, to make a long path name.

Sure enough, the entry field went blank. Another minute of investigation, and we found that it went blank right at 256 characters after clicking Submit. Now this was black-box testing – no knowledge of the underlying code. If we could see the code, maybe we could have seen what the programmer was doing in this case to render it blank instead of producing a reasonably helpful error dialog to the user.

But our bug went from being “Blanked path for long filenames” to “Paths of greater than 255 characters change to null values in entry field.” That’s a better bug. Not the root cause, but better. It’s hard to know for sure, but I bet it won us more credibility and respect from the programmer. I know it helped my tester win more credibility with me, and I know in the future he’s going to be interested in winning more by asking follow-up questions and running follow-up tests that gather more context.

Spread the word. Share this post!