Function Testing Test each function / feature / variable in isolation. Most test groups start with fairly simple function testing but then switch to a different style, often involving the interaction of several functions, once the program passes the mainstream function Within this approach, a good test focuses on a single function and tests it with middle-of-theroad values. We don’t expect the program to fail a test like this, but it will if the algorithm is fundamentally wrong, the build is broken, or a change to some other part of the program has fowled this code. These tests are highly credible and easy to evaluate but not particularly powerful. Some test groups spend most of their effort on function tests. For them, testing is complete when every item has been thoroughly tested on its own. In my experience, the tougher function tests look like domain tests and have their strengths. Domain
Good Domain Testing The essence of this type of testing is sampling. We reduce a massive set of possible tests to a small group by dividing (partitioning) the set into subsets (subdomains) and picking one or two representatives from each subset. In domain testing, we focus on variables, initially one variable at time. To test a given variable, the set includes all the values (including invalid values) that you can imagine being assigned to the variable. Partition the set into subdomains and test at least one representative from each subdomain. Typically, you test with a "best representative", that is, with a value that is at least as likely to expose an error as any other member of the class. If the variable can be mapped to the number line, the best representatives are typically boundary values. Most discussions of domain testing are about input variables whose values can be mapped to the number line. The best representatives of partitions in these cases are typically boundary cases. A good set of domain tests for a numeric variable hits every boundary value, including the minimum, the maximum, a value barely below the minimum, and a value barely above the maximum. Whittaker (2003) provides an extensive discussion of the many different types of variables we can analyze in software, including input variables, output variables, results
Good Test Cases Copyright © Cem Kaner 2003. All rights reserved. Page 8 of intermediate calculations, values stored in a file system, and data sent to devices or other programs. Kaner, Falk & Nguyen (1993) provided a detailed analysis of testing with a variable (printer type, in configuration testing) that can’t be mapped to a number line. These tests are higher power than tests that don’t use “best representatives” or that skip some of the subdomains (e.g. people often skip cases that are expected to lead to error messages). The first time these tests are run, or after significant relevant changes, these tests carry a lot of information value because boundary / extreme-value errors are common. Bugs found with these tests are sometimes dismissed, especially when you test extreme values of several variables at the same time. (These tests are called corner cases.) They are not necessarily credible, they don’t necessarily represent what customers will do, and thus they are not necessarily very motivating to stakeholders. Specification-Based Testing Check the program against every claim made in a reference document, such as a design specification, a requirements list, a user interface description, a published model, or a user manual. These tests are highly significant (motivating) in companies that take their specifications seriously. For example, if the specification is part of a contract, conformance to the spec is very important. Similarly products must conform to their advertisements, and life-critical products must conform to any safety-related specification. Specification-driven tests are often weak, not particularly powerful representatives of the class of tests that could test a given specification item. Some groups that do specification-based testing focus narrowly on what is written in the document. To them, a good set of tests includes an unambiguous and relevant test for each claim made in the spec. Other groups look further, for problems in the specification. They find that the most informative tests in a well-specified product are often the ones that explore ambiguities in the spec or examine aspects of the product that were not well-specified. Risk-Based Testing Imagine a way the program could fail and then design one or more tests to check whether the program will actually fail that in way. A “complete” set of risk-based tests would be based on an exhaustive risk list, a list of every way the program could fail. A good risk-based test is a powerful representative of the class of tests that address a given risk. To the extent that the tests tie back to significant failures in the field or well known failures in a competitor’s product, a risk-based failure will be highly credible and highly motivating. However, many risk-based tests are dismissed as academic (unlikely to occur in real use). Being able to tie the “risk” (potential failure) you test for to a real failure in the field is very valuable, and makes tests more credible.
Risk-based tests tend to carry high information value because you are testing for a problem that you have some reason to believe might actually exist in the product. We learn a lot whether the program passes the test or fails it. StressGood Test Cases Copyright © Cem Kaner 2003. All rights reserved. Page 9 Testing There are a few different definition of stress tests. Under one common definition, you hit the program with a peak burst of activity and see it fail. IEEE Standard 610.12-1990 defines it as "Testing conducted to evaluate a system or component at or beyond the limits of its specified requirements with the goal of causing the system to fail." A third approach involves driving the program to failure in order to watch how the program fails. For example, if the test involves excessive input, you don’t just test near the specified limits. You keep increasing the size or rate of input until either the program finally fails or you become convinced that further increases won’t yield a failure. The fact that the program eventually fails might not be particularly surprising or motivating. The interesting thinking happens when you see the failure and ask what vulnerabilities have been exposed and which of them might be triggered under less extreme circumstances. Jorgensen (2003) provides a fascinating example of this style of work. I work from this third definition. These tests have high power. Some people dismiss stress test results as not representative of customer use, and therefore not credible and not motivating. Another problem with stress testing is that a failure may not be useful unless the test provides good troubleshooting information, or the lead tester is extremely familiar with the application. A good stress test pushes the limit you want to push, and includes enough diagnostic support to make it reasonably easy for you to investigate a failure once you see it. Some testers, such as Alberto Savoia (2000), use stress-like tests to expose failures that are hard to see if the system is not running several tasks concurrently. These failures often show up well within the theoretical limits of the system and so they are more credible and more motivating. They are not necessarily easy to troubleshoot. Regression Testing Design, develop and save tests with the intent of regularly reusing them, Repeat the tests after making changes to the program. This is a good point (consideration of regression testing) to note that this is not an orthogonal list of test types. You can put domain tests or specification-based tests or any other kinds of tests into your set of regression tests. So what’s the difference between these and the others? I’ll answer this by example: Suppose a tester creates a suite of domain tests and saves them for reuse. Is this domain testing or regression testing?
User Testing User testing is done by users. Not by testers pretending to be users. Not by secretaries or executives pretending to be testers pretending to be users. By users. People who will make use of the finished product. User tests might be designed by the users or by testers or by other people (sometimes even by lawyers, who included them as acceptance tests in a contract for custom software). The set of user tests might include boundary tests, stress tests, or any other type of test.Good Test Cases Copyright © Cem Kaner 2003. All rights reserved. Page 10 Some user tests are designed in such detail that the user merely executes them and reports whether the program passed or failed them. This is a good way to design tests if your goal is to provide a carefully scripted demonstration of the system, without much opportunity for wrong things to show up as wrong. If your goal is to discover what problems a user will encounter in real use of the system, your task is much more difficult. Beta tests are often described as cheap, effective user tests but in practice they can be quite expensive to administer and they may not yield much information. For some suggestions on beta tests, see Kaner, Falk & Nguyen (1993). A good user test must allow enough room for cognitive activity by the user while providing enough structure for the user to report the results effectively (in a way that helps readers understand and troubleshoot the problem). Failures found in user testing are typically credible and motivating. Few users run particularly powerful tests. However, some users run complex scenarios that put the program through its Scenario Testing A scenario is a story that describes a hypothetical situation. In testing, you check how the program copes with this hypothetical situation. The ideal scenario test is credible, motivating, easy to evaluate, and complex. In practice, many scenarios will be weak in at least one of these attributes, but people will still call them scenarios. The key message of this pattern is that you should keep these four attributes in mind when you design a scenario test and try hard to achieve them.
Scenario Testing
Good Test Cases Copyright © Cem Kaner 2003. All rights reserved. Page A scenario is a story that describes a hypothetical situation. In testing, you check how the program copes with this hypothetical situation. The ideal scenario test is credible, motivating, easy to evaluate, and complex. In practice, many scenarios will be weak in at least one of these attributes, but people will still call them scenarios. The key message of this pattern is that you should keep these four attributes in mind when you design a scenario test and try hard to achieve them.
An important variation of the scenario test involves a harsher test. The story will often involve a sequence, or data values, that would rarely be used by typical users. They might arise, however, out of user error or in the course of an unusual but plausible situation, or in the behavior of a hostile user. Hans Buwalda (2000a, 2000b) calls these "killer soaps" to distinguish them from normal scenarios, which he calls "soap operas." Such scenarios are common in security testing or other forms of stress testing. In the Rational Unified Process, scenarios come from use cases. (Jacobson, Booch, & Rumbaugh, 1999). Scenarios specify actors, roles, business processes, the goal(s) of the actor(s), and events that can occur in the course of attempting to achieve the goal. A scenario is an instantiation of a use case. A simple scenario traces through a single use case, specifying the data values and thus the path taken through the case. A more complex use case involves concatenation of several use cases, to track through a given task, end to end. (See also Bittner & Spence, 2003; Cockburn, 2000; Collard, 1999; Constantine & Lockwood, 1999; Wiegers, 1999.) For a cautionary note, see Berger (2001). However they are derived, good scenario tests have high power the first time they’re run. Groups vary in how often they run a given scenario test. groups create a pool of scenario tests as regression tests. Others (like me) run a scenario once or a small number of times and then design another scenario rather than sticking with the ones they’ve used before. Testers often develop scenarios to develop insight into the product. This is especially true early in testing and again late in testing (when the product has stabilized and the tester is trying to understand advanced uses of the product.) State- State-Model-Based Testing
In state-model-based testing, you model the visible behavior of the program as a state machine and drive the program through the state transitions, checking for conformance to predictions from the model. This approach to testing is discussed extensively at www.model-basedtesting. org. In general, comparisons of software behavior to the model are done using automated tests and so the failures that are found are found easily (easy to evaluate). In general, state-model-based tests are credible, motivating and easy to troubleshoot. However, state-based testing often involves simplifications, looking at transitions between operational modes rather than states, because there are too many states (El-Far 1995). Some abstractions to operational modes are obvious and credible, but others can seem overbroad or otherwise odd to some stakeholders, thereby reducing the value of the tests. Additionally, if the model is oversimplified, failures exposed by the model can be difficult to troubleshoot (Houghtaling,
High-Volume Automated Testing Good Test Cases Copyright © Cem Kaner 2003. All rights reserved. Page 12 High-volume automated testing involves massive numbers of tests, comparing the results against one or more partial oracles. The simplest partial oracle is running versus crashing. If the program crashes, there must be a bug. See Nyman (1998, 2002) for details and experience reports. State-model-based testing can be high volume if the stopping rule is based on the results of the tests rather than on a coverage criterion. For the general notion of stochastic statebased testing, see Whittaker (1997). For discussion of state-model-based testing ended by a coverage stopping rule, see Al-Ghafees & Whittaker (2002). Jorgensen (2002) provides another example of high-volume testing. He starts with a file that is valid for the application under test. Then he corrupts it in many ways, in many places, feeding the corrupted files to the application. The application rejects most of the bad files and crashes on some. Sometimes, some applications lose control when handling these files. Buffer overruns or other failures allow the tester to take over the application or the machine running the application. Any program that will read any type of data stream can be subject to this type of attack if the tester can modify the data stream before it reaches the program. Kaner (2000) describes several other examples of high-volume automated testing approaches. One classic approach repeatedly feeds random data to the application under test and to another application that serves as a reference for comparison, an oracle. Another approach runs an arbitrarily long random sequence of regression tests, tests that the program has shown it can pass one by one. Memory leaks, stack corruption, wild pointers or other garbage that cumulates over time finally causes failures in these long sequences. Yet another approach attacks the program with long sequences of activity and uses probes (tests built into the program that log warning or failure messages in response to unexpected conditions) to expose problems. High-volume testing is a diverse grouping. The essence of it is that the structure of this type of testing is designed by a person, but the individual test cases are developed, executed, and interpreted by the computer, which flags suspected failures for human review. The almost complete automation is what makes it possible to run so many tests. The individual tests are often weak. They make up for low power with massive numbers. Because the tests are not handcrafted, some tests that expose failures may not be particularly credible or motivating. A skilled tester often works with a failure to imagine a broader or more significant range of circumstances under which the failure might arise, and then craft a test to prove it. Some high-volume test approaches yield failures that are very hard to troubleshoot. It is easy to see that the failure occurred in a given test, but one of the necessary conditions that led to the failure might have been set up thousands of tests before the one that
Exploratory Testing Exploratory testing is “any testing to the extent that the tester actively controls the design of the tests as those tests are performed and uses information gained while testing to design new and better tests” (Bach 2003a). Bach points out that tests span a continuum between purely scripted (the tester does precisely what the script specifies and nothing else) to purely exploratory (none of the tester’s activities are pre-specified and the tester is not required to generate any test documentation beyond bug reports). Any given testing effort falls somewhere on this continuum. Even predominantly prescripted testing can be exploratory when performed by a skilled tester. “In the prototypic case (what Bach calls “freestyle exploratory testing”), exploratory testers continually learn about the software they’re testing, the market for the product, the various ways in which the product could fail, the weaknesses of the product (including where problems have been found in the application historically and which developers“In the prototypic case (what Bach calls “freestyle exploratory testing”), exploratory testers continually learn about the software they’re testing, the market for the product, the various ways in which the product could fail, the weaknesses of the product (including where problems have been found in the application historically and which developers tend to make which kinds of errors), and the best ways to test the software. At the same time that they’re doing all this learning, exploratory testers also test the software, report the problems they find, advocate for the problems they found to be fixed, and develop new tests based on the information they’ve obtained so far in their learning.” (Tinkham & Kaner, 2003) An exploratory tester might use any type of test--domain, specification-based, stress, risk-based, any of them. The underlying issue is not what style of testing is best but what is most likely to reveal the information the tester is looking for at the moment. Exploratory testing is not purely spontaneous. The tester might do extensive research, such as studying competitive products, failure histories of this and analogous products, interviewing programmers and users, reading specifications, and working with the product.
|