Sunday 4 September 2011

Plastering over the cracks - Why fixing bugs is missing the point


A few years ago I was fortunate enough to travel through China and take a tour up the Yangtze river, passing the new Hydro-electric dam that was in the process of flooding the famous "Three Gorges". As our (government operated) tour boat navigated the giant locks up the side of the dam our guide informed us that "the cracks that have been reported on the BBC news have been fastidiously fixed". My gut reaction this statement was a feeling of mild panic, yet I knew that the cracks had been repaired - what was my problem? My confidence in the integrity of the dam was massively impacted by the fact that I knew that the apparent fault had been fixed, yet I had no confidence that the underlying problem had been understood. I was travelling on a large body of water which was only being prevented from rushing down the valley below was a lump of concrete that a few weeks ago had cracks in and had no evidence that the government had any idea why those cracks had occurred in the first place. What was to stop the fault that had caused those cracks cropping up in another part of the dam?

Planned Failure


More recently I was reading a discussion group on the subject of what testers felt were the most useful/useless statistics to a testing process. One of the figures suggested was that of actual versus predicted bug rates, the idea being that developers predicted the likely bug rates for a development and then progress was measured on how many bugs were being detected and fixed compared to this "defect potential". I dislike this concept for many reasons, but the foremost of these is exactly the same reason that my subconscious was nagging me on the Yangtze river:-

Just fixing defects is missing the point.


A bug is more than just a flaw in code. It is a problem whereby the behaviour of the system contradicts our understanding of a stakeholder's expectation of what the system should do. The key to an effective resolution is understanding that problem. Only then can the appropriate fix be implemented. I believe that the benefit that can be obtained from the resolution of a bug depends hugely on the understanding of the problem domain held by the individuals implementing and testing the fix.

Factors such as when and how the issue is resolved, and who implements and retests the fix, can impact this understanding. Even the same bug fix applied at a different time by a different person can have an impact on the overall quality of the software through the identification of further issues in one of the feature domains. If the identification or resolution of issues are delayed, either in a staged testing approach, or through the accumulation of bugs in a bug tracking system in an iterative process, then the chances of related bugs going undetected or even being introduced are higher.

While the Iron is Hot


Many factors that can influence the successful understanding of the underlying cause of a bug are impacted hugely by the timing and context in which the bug is tackled.

  • Understanding of the Problem Domain
  • It could be that a problem acually calls into question some principle underpinning the feature model, for example an assumption on the implementation environment or the workflow. A functional fix implemented as part of a bug fixing cycle may resolve the immediate bug but leave underlying flaws in the feature model unaddressed.
  • Understanding of Solution Domain
  • When a feature is being implemented, the indiviuals involved in that implementation will be holding a huge amount of relevant contextual information in their heads. With fast feedback cycles and quick resolution on issues then problems can be addressed while the details of the implementation are fresh in the minds of the developer and associated issues are more likely to be identified. It could be that the most apparent resolution to a bug would compromise a related area of the feature, a fact that could be overlooked if tackling as a standalone fix.
  • Knowledge of related features
  • It is a common situation for a developer to work on a number of similar stories or features as part of a project, often using similar approaches on related features. If an issue has been identified with the functional solution implemented then it could be that similar unidentified problems are apparent in related areas that the developer has worked on.
  • Understanding of Testing Domain
  • In addition to the developers, as I discussed in this post, the tester working on a feature will have a better understanding of the testing domain when actively working on that feature area than if testing the issue cold at a later date. Addressing the retesting of the problem immediately provides the opportunity to review the testing of related features and perform further assessment of those areas, an opportunity that not be apparent if tackling as a point retest

By operating with fast feedback and reslution cycles we take advantage of the increased levels of understanding of the problem, solution and testing domains affected, thereby maximising our chances of a successful resolution and identification of related issues. If a software development process embraces the prediction, acceptance and delayed identification and resolution of issues then many of the collateral benefits that can be gained from tackling issues in the context in which they are introduced are lost.

Copyright (c) Adam Knight 2011 a-sisyphean-task.blogspot.com Twitter: adampknight
MaikNog said...

Nice read. I like stories, which connect "real" things (like the dam story) with the more "un-real" (not sooo haptic) world of testing.
I was intrigued by the mentioning of Sysiphos; wrote an article (http://hanseatictester.info/?p=23) with that content; more on a meta layer though.

Cheers,
MaikNog

Adam Knight said...

Thanks for the comment. Software testing is in many ways the interface between software development and the "real world" so I think relating it to other more tangible situations that elicit an emotional response can be very effective. In the case of the dam, a more extreme case of a common situation to us demonstrates the discomfort that we have with this approach, something which may be less apparent when working with the more familiar context of software bugs.

Whatsapp Button works on Mobile Device only

Start typing and press Enter to search