You Can’t Fix Quality Just By Catching Bugs

Published in

The Startup

11 min readOct 21, 2020

Introduction

If you ask any software quality professional what they do, most will tell you (I should know, I’ve been working as a QA/QE/Test manager for the last 12 years) their job function is mostly centered around testing. It comes in various forms: manual testing, automated testing, exploratory testing, functional testing, performance testing, load testing, mobile device testing etc. All these activities have the primary purpose of finding bugs. These bugs get logged in bug tracking systems or are somehow communicated to the development team to have them fixed before a release or are scheduled to be fixed at some later date.

Yet ask them again if things are getting better, most will tell you it is not. It seems like “catching the wind”, like a “whack-a-mole” — you fix one bug this release and another pops up in another.

The QA or Test manager often presents charts and reports of test coverage and how many bugs were found in the last bug bash or pre-release testing to management. Everything is great right? Well, not really. Social media reputation scores, App Store ratings and user feedback point in the opposite direction. What’s going on? There is so much effort spent finding and fixing bugs yet Quality seems out of reach.

Up to this point, we have not defined Quality and yet it is at the heart of the matter. Interestingly most Quality or Test professionals when asked to define it, often balk at it as it is a rabbit hole. Why is that? Let’s dive in.

What is Quality?

This is not a Birkin or a Kate Spade. Photo by Arno Senoner on Unsplash

The Hermes Birkin and the Kate Spade NY Hayes Street Isobel

Both are women’s handbags but there’s a wide gulf separating the two. If you’re a straight male, it would be in your best interest to know what that is. Hint: it has something to do with the price. The former is easily in the range of $10,000 — $20,000 and the latter is around $400.

Why is this?

I have a unique insight to this as one of my uncles does high end leather goods repair and tailoring. He once showed me how he could tell if a handbag was a genuine Louis Vuitton or something less valuable. It was always down to the quality of the fittings ie. zippers, clasps; and the quality of the workmanship ie. the way the leather is treated, the stitching etc. High quality handbags almost never have their zippers get stuck and they also last a really long time.

How does this relate to software quality? We’ll come back to this in a bit.

But before we proceed, ask yourself this: what separates a $400 handbag or even a $5,000 handbag and a $20,000 Birkin? There must be something else at play.

The Tesla and the Honda

I drive a Honda, my first car was a Honda and my next car is very likely going to be a Honda. They are known to be super reliable and as one of my previous supervisors used to say “run forever”. My first Honda made almost 200k miles before it reached the point that repairing the engine would cost more than buying new. Yet if you ask any Silicon Valley engineer, most will tell you their dream car is a Tesla.

The thing is, Teslas are known to have a ton of reliability issues (see [1]). Yet they are highly desirable.

Be patient with me, I’m going somewhere with this.

The digital camera

I used to lead a small software quality team in Singapore. On that team was an engineer who had moved over from India. He had been in Singapore for about 6 months and finally had the opportunity to go back to see his family. He was very excited and started going shopping to buy stuff for his family and friends back home.

One day I walked past his desk and noticed him looking very troubled. I naturally asked him why. He said he had asked his younger sister what she wanted and she said she wanted a digital camera (this was before the iPhone). He was looking at specs of different models and could not decide. I suggested he ask her what she wanted.

The next day I saw him again and I asked him what his sister had said. He looked at me sheepishly and said “she asked if there was anything in pink”.

Quality is in the eye of the beholder

Do you see a pattern in the 3 examples? At the end of the day, what is perceived to be of value, of high quality, really depends on the customer’s perception.

How does this relate to how quality is measured in a typical software quality process?

How quality is defined in software engineering and how it is measured

Standard definitions of Quality in the software engineering field go something like this (we use the definition from PMBOK [2]) “it is the level at which a product or service fulfills the requirements”.

A bug therefore is a deviation of one or more of those requirements.

In this sense, we can look at the software quality process as a bug minimization problem ie. with every iteration of the SDLC (software development lifecycle) we decrease the number of bugs found. Quality thus is an inverse function of the number of bugs found in the testing process.

In reality it is not so simple. One problem lies in defining the corpus that is the requirements. There are explicit and implicit requirements and sometimes these implicit requirements are part of company folklore/culture. How then do we capture those and canonize them?

Assuming you have this corpus, then you can start talking about coverage ie. how much do your tests cover the inspection of compliance to this corpus? Here the process is also flawed. I have written about this in a previous article [3].

But there’s a bigger, more fundamental problem. If you’ve read this far it may have dawned on you by now.

The fundamental disconnect

The key disconnect, if you have not figured out by now, is that the way in which quality is being measured (ie. inverse to the number of bugs found) is tenuously related user perception. Sure, if the product or service does not work the way it is supposed to, users will not be happy but even if you did fulfill all the requirements in the corpus, users may not deem the product or service to be of high quality. It’s like in the digital camera story, you could study all the specs eg. number of megapixels, zoom levels, storage but the user is just looking for a camera that works and is pink.

In the quality management profession, QA and test managers regularly present dashboards of metrics telling how great the test process is, the number of bugs found etc but management doesn’t necessarily care.

Is this always the case?

The “Maslow’s hierarchy of needs” for quality

In Maslow’s hierarchy of needs, we have a pyramid with the following levels (from the bottom):

Physiological
Safety
Love/Belonging
Esteem
Self actualization

For software quality, in my experience, there is a similar hierarchy (again bottom up):

Basic hygiene ie. does your app or service crash or become unusable frequently?
Value provision ie. do major functionalities work as designed?
Performant ie. do you waste your user’s time unnecessarily?
Engineering craftsmanship ie. can engineering ship code with almost zero bugs multiple times a day and with zero manual testing?
Pleasure ie. do your users derive pleasure in using your product?

On top of that, there is also product maturation levels that correspond to maturity of a company:

MVP (Minimum Viable Product) level
Product-Market fit iteration level
Paying customer (early adopter) level
Paying customer (majority) level
Mature product level

Let me explain.

In the beginning, the product is just an idea and a bunch of hypotheses about what users really want (btw users sometimes do not know what they want until it is put in front of them. The first iPhone was one such example). The key task here is, as Eric Ries describes in “The lean start-up” (see [4]), Validated Learning. To get there, product engineering teams work to get an MVP out the door and and in the hands of users. Then through various product-market fit experiments, they validate or disprove hypotheses about what users really want and refine the product. Each time, the product is tweaked. At times, there is a need for a major pivot — this means a major shift in the product direction. At this stage, some level of basic hygiene and value provision in terms of software quality is required but just enough to validate the hypotheses. There will be a lot of throwaway code so some amount of manual testing makes sense. Test automation is mainly there to save time and resources for testing and shorten the cycle to release a new version of the software for hypothesis validation. These are usually end-to-end test automation as it validates features the company wants the users to use. It is not uncommon to find test automation unable to keep up with the large number of changes and manual testing is used to fill in the gap. Here the test team’s primary responsibility is catching bugs.

The next major stage comes when the company has paying customers. This usually coincides with the setting up of a customer support team. This is when bad software starts to cost the company both monetarily as well as in customer goodwill. At this stage, you are looking at the first 3 levels of software quality: basic hygiene, value provision and performant software. At this stage, the technical debt accumulated in the MVP and product-market fit phase start to become material and there is usually a realization by management that in order to go full scale into rapid adoption of the software at the majority level (the next phase), significant work needs to be done to refactor the code for testability, adopt the Testing Pyramid (see [5]) and utilize smarter tools for testing (some AI powered, see [6]). Again, the primary focus here is catching bugs.

If the team has made the transition into a state where a large portion of the tests are automated and CI/CD pipelines are well set up, then the development team is ready to make the next stage of growth. If not, the team will struggle to balance the demands of the growth in features and usage as well as paying down the technical debt. In this stage, in order to get to a level of sustainable growth, Engineering Craftsmanship will be needed. This means strong code review practices, sound architecture and system design, DevOps/SRE processes, Observability etc. This is where software quality becomes more than just about catching bugs — it is about achieving good code quality and overall engineering process. This is not to say that this is not important in the previous stages, it is just that given the stage of growth, that is not really the priority at that time.

A little more about Engineering Craftsmanship: Remember earlier I had written about the Hermes Birkin and the Kate Spade. I had also written about how various crafts-people work on a handbag from treating the leather, sewing on the fittings etc. In software there is a similar craft. Indeed software engineering is a craft: how code is designed, organized, written, reviewed, tested, deployed, monitored are all aspects of the craft. At the very foundation of this craft are Code Reviews and Unit Tests. In a previous article (see [3]), I write about why I believe that these 2 form the very bedrock of software quality.

At this stage of growth, quality is no longer just about catching bugs. It is about sustainable software development. You want to keep taking steps forward, not 1 step forward and 2 steps back. This requires a significant culture change in the engineering team where previously it was delivering the feature at all costs, now it is delivering quality software in a timely manner without burning down part of the house.

At this stage of growth, quality is also about great customer support: how soon can you turn-around customer issues, how short is your MTTR (Mean Time To Resolution), how Available is your service (is it 99%, 99.9% etc). Unfortunately, at this stage, due to systems getting more complex in order to better serve customers, Dark Debt (see [9]) also starts to show itself and has a bigger impact in operations. In a nutshell, Dark Debt are all the issues caused by unforeseen and unanticipated interactions between components in a highly complex system. To address this, monitoring of services is absolutely critical.

How does the Test team transition then and to what? In another article (see [7]), I write about some ways in which the test team is able to stay relevant:

Do deeper into their platform eg. if they were writing tests for iOS, write some feature code
Pick up other skills eg. Penetration testing, Observability, Chaos Engineering, DevOps etc
For manual testers, develop deep understanding of the user and work with product teams to help refine requirements; or pick up the skills for compliance needs eg. HIPAA, PCI needed for your industry.

In the last stage, we are looking at a mature product. This is usually characterized by a large cost of bad software — both monetary (possible lawsuits) and reputation. Product innovation and iteration is generally slower here. If the engineering team had made the transition as described in the last section and have operationalized the processes well, then the next stage is to focus on efficiency of operation and maintenance/enhancements.

The last point I want to make is that the pinnacle of software quality is that the product is simply a pleasure to use. I once spoke to a test engineer who had worked on the first iPhone and I told him that one of the things I love about the iPhone were the simple things that Apple got right. For example the fact that if you went from speaker to headphones, the volume control was seamless. He explained that they worked on testing that feature for weeks (if memory serves me correctly) and that in itself was a feat of engineering excellence. I would guess this simple feature alone must have required UX designers, engineers, product managers and test engineers hours to get it just right. In the book “Creative Selection” by Ken Kocienda (see [8]) Ken talks about his time at Apple working on the first software keyboard for the iPhone. If you’re an engineer, product manager or test professional, I highly encourage you to read it as it talks about the effort put in and attention to detail to build a product that is a pleasure to use.

Conclusion

I hope that in the previous sections of this article I have been able to convince you that achieving software quality is not just about fixing bugs. There is much more to it and it is also a team sport.

References

[1] “Tesla Model 3 loses CR Recommendation over reliability issues”, Nov 2019, Patrick Olsen, https://www.consumerreports.org/car-reliability-owner-satisfaction/tesla-model-3-loses-cr-recommendation-over-reliability-issues/

[2] Quality Management, PMI, https://www.pmi.org/learning/library/quality-management-9107

[3] “Unit Tests and Code Reviews: the bedrock of software quality”, Sept 2020, Heemeng Foo, https://medium.com/dev-genius/unit-tests-and-code-reviews-the-bedrock-of-software-quality-9a23cd24558b

[4] The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses, 2011, Eric Ries, Currency

[5] “Just say no to end-to-end tests”, Apr 2015, Mike Wacker, Google Testing Blog, https://testing.googleblog.com/2015/04/just-say-no-to-more-end-to-end-tests.html

[6] “The Case for AI in Software Testing”, Sept 2020, Heemeng Foo, https://medium.com/dev-genius/the-case-for-ai-in-software-testing-5aba64e62d0f

[7] “Who should do software testing? Dev or Test?”, Jun 2020, Heemeng Foo, https://medium.com/dev-genius/who-should-do-software-testing-dev-or-test-41c7ea39ee83

[8] Creative Selection: Inside Apple’s Design Process During the Golden Age of Steve Jobs, 2018, Ken Kocienda, St Martin’s Press

[9] “Dark Debt”, Nov 2017, John Allspaw, https://medium.com/@allspaw/dark-debt-a508adb848dc