Swing Extreme Testing

By Lindsay Peters , Tim Lavers
    What do you get with a Packt Subscription?

  • Instant access to this title and 7,500+ eBooks & Videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. What Needs Testing?

About this book

Thorough testing is the basis of good software. Whether we use an agile development methodology such as Extreme Programming, or a more traditional approach, we must test our software at the unit level and application level. The tests must run automatically and cover all aspects of the software.
In this book, the authors draw on more than 20 years of experience to show how automated testing can be applied to a real commercial product.

This book will teach you how to automatically test user interfaces (Swing GUIs), the help system, internationalization, log files, spreadsheets, email, and web services, how to perform tests involving multiple JVMs, and a host of other things. These tests are applied at the module level (unit tests) and at the application level (function tests).

The authors have developed the test processes in the broader context of an Extreme Programming (XP) methodology. However, the testing techniques are certainly not specific to XP, and may be applied within any given development methodology.

Publication date:
June 2008


Chapter 1. What Needs Testing?

The aim of testing is to find errors in our software.

Any deviation from the required behavior of an application is an error. The 'required behavior' is defined by the so-called user stories in Extreme Programming (XP), or by a requirements specification in more traditional methodologies. Additionally, there will be implicit requirements on usability, reliability, and scalability. These may be derived from company or industry standards, customer expectations, user documentation, and various other sources.

Clearly, we need tests to prove that our software satisfies these formal specifications. However, it would be completely unrealistic to think that we can thoroughly test our application by testing it against each of these high-level requirements or company standards, and not testing the components comprising the application. This would be like attempting to production-test a new car simply by driving it without having first tested each component in isolation, such as the brake system. Flaws in a component may manifest themselves in scenarios that we could never have imagined beforehand, and so would never have designed an application-level test case for.

Therefore, in terms of testing infrastructure, we need to have at least two views of our application:

  • The Unit or Module view, where "unit" or "module" refers to the smallest compilation component. In Java, this is a class.

  • The Application view, where "application" refers to the complete set of classes providing the application. The application will be made up of our developed components, the standard Java class libraries, plus libraries of third-party components, executing on one or more JVMs.

Let's now look at some of the broader aspects of testing at the unit and application levels. The nuts and bolts of how to implement and organize our tests at these levels will be dealt with the later chapters. First of all we will take a look at an example.


An Example

A good example of the need for proper unit testing came up recently with LabWizard.

For a LabWizard clinical Knowledge Base, the Pathologist uses a tool called the Validator to review reports generated by the Knowledge Base, make changes if necessary, and then release them to the referring doctor:

The main screen here is an instance of a class called CaseViewer. The patient demographics and test history are shown in the table. The interpretive report is shown in the text area below the table.

After reviewing the interpretation, and making any necessary changes to it, the clinical reviewer activates the Accept button, and the report is sent back to the Laboratory Information System where it is automatically released to the referring doctor.

Note that the Accept button has the focus by default. This is in fact a serious usability bug. There were several instances of a pathologist unwittingly pressing the space bar or the Enter key, and thereby releasing an inappropriate report. Rather, we would want the activation of the Accept button to be a very deliberate action, that is, by using either the mouse or the mnemonic.

The fix for this bug was to ensure that the focus is on the Skip button whenever a new report is presented for review. The test for this new behavior took only a matter of minutes to write, and runs in just a few seconds. Here it is:

* STR 1546. Check that the accept button does NOT have
* the focus to begin with, nor after a case is approved.
public boolean acceptButtonDoesNotHaveFocusTest() {
//Queue some cases and wait till the first one is showing
cvh.setNumberOfCasesToBeReviewed( 5 );
//Ensure that no cases have been approved yet
waitForNumberOfApprovedCases( 0 );
//Pressing the space bar should skip the case, not approve it
//Give plenty of time for a case to be sent to the LIS
//if it were going to be sent.
TestSetup.pause( 1 );
//Check that still no cases have been approved.
waitForNumberOfApprovedCases( 0 );
//Now approve a case using the mnemonic,
//and check that the focus shifts off the Accept button, back
//to the Skip button
mn( SimpleMessages.ACCEPT );
waitForNumberOfApprovedCases( 1 );
//Pressing the space bar should again skip the case
//Check that no more cases have been approved.
TestSetup.pause( 1 );
waitForNumberOfApprovedCases( 1 );
return true;

The structure and specifics of this test will only be clear by the end of Chapter 10. But even at this point, it should be apparent how simple and easy it can be to write user interface tests. Note also the convention for referencing the Software Trouble Report (STR) describing the bug in the comment for this test method.

Another point from this example is the necessity for such tests. Some developers want to believe that testing user interfaces is not necessary:

Of course GUI applications can (and should) be unit tested, but it isn't the GUI code itself that you want to test. It's the business and application logic behind the GUI. It's too easy to just say "don't unit test GUIs" because it's difficult. Instead, a simple separation of logic from GUI code means that the stuff which should be tested becomes easy to test; and the mechanical stuff (Does the UI freeze when I click this button? Is the layout correct?) can be left for visual inspection. (You do test your app by running it from time to time, don't you? And not just by waiting for a green bar.) If the UI itself doesn't behave, you'll quickly know about it. But if the application logic is wrong, you might not find out until the application has been released, unless you've got a decent set of unit tests. (From Don't unit test GUIs by Matt Stephens, the Agile Iconoclast. See http://www.regdeveloper.co.uk/2007/10/22/gui_unit_testing/).

With this particular bug, it really was the GUI that we wanted to test. We could do a manual test to ensure the default focus was not on the Accept button, but it would definitely take longer than the automated one that we've got. Why have people doing anything that a computer can automate? Furthermore, focus behavior can change unexpectedly as components are added or removed from a container. So a manual test in a previous release of our software would give us little confidence for subsequent releases.

The unit test is an appropriate place for tests, such as this, of the specific behavior of a component. Although we could test these things at the application level, such tests would be harder to write, more difficult to understand and maintain, and slower to run than the equivalent unit tests.


What Classes Do We Test?

Our experience is that in software Murphy's Law holds supreme: any class without a test is guaranteed to contain bugs.

So to the question, "What classes should be tested?", there is only one satisfactory answer:


Extreme Testing Guideline: Every public class requires an explicit unit test.

By "explicit unit test", we mean a test that creates an instance of the class and calls methods on that class directly. Whilst most software engineers will assent to this guideline in theory, it is surprising how difficult it can be to implement this as a work practice. Let's briefly consider why.

The most common objection is that there's not enough time to write the unit tests, either for existing or new classes. That is, there is a commercial imperative to write new code as quickly as possible, rather than "waste" time writing tests. In our experience of more than 20 years with many different types of software projects, this has never been a valid argument.

The cost of fixing a bug rises exponentially with the length of time it takes to find it. In particular, bugs found once the application has been released can typically be ten times more expensive to fix than bugs found while developing the class. For example, the LabWizard, "Accept button focus bug" took a matter of minutes to test for, and fix. But it took more than an hour of our time to resolve the problems it caused, not to mention the concern of the Pathologist, who had inadvertently approved an inappropriate report! ("The Economic Impacts of Inadequate Infrastructure for Software Testing", National Institute of Standards and Technology, May 2002, http://www.nist.gov/director/prog-ofc/report02-3.pdf is a worthwhile read). Bugs in deployed software effectively plunder development resources for use as support resources.

Furthermore, bugs that are found late in the development life cycle, or even worse, after releasing the software, have a very negative effect on team morale. We all have had the unfortunate experience of releasing a version of an application, then eagerly anticipating the development of exciting new features in the next release, only to be stopped dead in our tracks by customer-reported bugs that require us to revisit the version we just released and had thought we were finished with. The exact opposite is true for bugs found during unit tests—each one we find gives us a sense of achievement.

Another common argument is that certain classes are too trivial to require a test. Our response is that, the simpler a class, the simpler will be its test, so what's the problem? That argument also misses the point that unit tests are not just testing the current version of the code, they also test future versions. Simple classes are likely to become more complex as a project evolves, and changes to a class can all too easily introduce errors. This is particularly true if the original authors of a class do not make all the changes themselves. Putting unit tests in from the very beginning makes this evolution a safer process and pays for the cost of testing up-front. For an excellent rebuttal to many of the arguments against testing early and often, see "Myths and realities of iterative testing" by Laura Rose, IBM developerWorks.

The converse may also be argued, that certain classes are too complex to be tested. However, complex classes are the ones that are least likely to be understood, most likely to contain errors, and most difficult to maintain.

Unit testing address these issues by:

  • providing a specification of the class' behavior, hence enhancing our understanding

  • increasing our chance of detecting errors, and

  • enhancing maintainability by providing a behavioral benchmark against which to measure the effect of further changes to the class, or changes to the way it is used.

It may be that the class in question is indeed overly complex. For example, it does not represent a single well-defined abstraction. In this situation it may well need to be redesigned to reduce its complexity. One of the benefits of unit testing above the mere fact of detecting errors, is the increased discipline that it brings to design and implementation.

Of course, in Object Oriented software, a single class can contain a lot of sub-objects that are themselves quite complex. For example, a LabWizard CaseViewer contains classes to interface it to the LabWizard server, and a Swing Timer polling for more cases, as well as all the GUI components and associated actions. Despite this complexity, we have a unit test class for CaseViewer that tests every GUI component and every exposed method, just as we do for our most basic utility classes. This unit test has been invaluable in preventing bugs from creeping into one of our most important user interface components. Based on our experience, we'd say that a class that is seen as too complex to test, is simply too complex.

We will talk about testing legacy code later in this chapter.

Some programmers will argue that a class does not have much meaning outside its package, and so can't be tested independently. Our response is that if the class is tightly coupled within a package, it should be made private or package visible, not public, in which case it need not be explicitly tested, as we will discuss below.

Finally, some programmers believe that their code is bug-free, so why should they bother with tests? Of course, this claim is wrong in most cases. Even if some class is indeed bug-free at some point in time, there is no guarantee that it will remain bug-free in the next release after changes have been made to it, or to the classes it depends on, or to the classes that depend on it.

In fact, there is an even more compelling reason why all classes must be unit tested. If we adopt the XP test-first approach to unit development, which we strongly endorse, the test class must be developed first, that is, before the production class.


Test First—Always!


Extreme Testing Guideline: We should implement our unit test classes before we implement our production classes, to the extent that this is possible.

This guideline should be read as being an unattainable goal. Some compromise and common sense is needed in its interpretation. We cannot really write a unit test for a class before the class itself, as the unit test will not compile. The best we can do in practice is to write a small amount of the production class, then test what we have written. We might have stub implementations of the production class methods, so that our code compiles. In this sense, our tests are ahead of our production code. Our aim is to always keep the test code as far ahead of the production code as is practical. An example of the test-first approach is given in the next chapter.

There are several reasons for this guideline.

Firstly, the test class provides an unambiguous specification of the corresponding production class. Even after following the necessary software engineering practices of requirements and design analysis, we are always surprised when we start writing our test class at just how many specification issues are in fact still undecided. All these issues need to be resolved by the time our test class is complete, and so the test class becomes the definitive and concrete expression of the production class' specification.

Secondly, writing our test class first means that we know when to stop implementing our production class. We start with an absolutely trivial implementation, and stop once all our tests pass. This discipline has the benefit of keeping 'code bloat' down. As developers, we find it too easy to add 'convenience' methods to a class that might be used later on in the lifecycle of a project. It's tempting to think that these methods are so simple that even if they're never used, there's no harm done. But such methods actually do cost something. They cost a few minutes to write initially, and they make a class harder to understand. If they are not maintained properly, the behavior of such a method might change over time and, years later, when eventually used, they might cost time by not actually working as expected. Forcing developers to test every method they write minimizes the number of unused methods, simply because nobody really wants to write tests that they don't have to.

If all our tests pass, but there is some aspect of our class that we think is not yet quite right, then our tests are inadequate. Therefore, before going any further with our production class, we first enhance the test class to check for the desired behavior. Once our production class fails the new test, we can return to the production class, and implement the desired behavior—and our implementation is complete when once again the tests pass.

Finally, if we attempt to write a test class after the production class has been developed, the chances are that there will be aspects of the production class that will need to be changed to make it more easily testable. It may even need to be substantially redesigned. Writing the test class before the production class is therefore a much more efficient use of our time, as this additional effort in rework or redesign is largely eliminated.

The test-first approach also applies in the maintenance phase of a class. Once a bug is scheduled to be fixed, the test class must first be modified to reproduce the bug, and so fail. The production class can then be modified until the test passes again, confirming that the bug has indeed been fixed.

This is precisely what we did with the test method acceptButtonDoesNotHaveFocusTest() shown earlier. When first implemented, this test failed half-way through at the check:

TestSetup.pause( 1 );
waitForNumberOfApprovedCases( 0 );

showing that the bug had been reproduced. If we had modified the test code after the bug had been fixed in the production CaseViewer class, we could never be sure that the test would have reproduced the error, and hence we could never be sure whether the fix we applied to the production class was adequate.


Extreme Testing Guideline: We should modify our tests to reproduce a bug before we attempt to fix it.


What Classes Don't We Test?

In general, we don't recommend explicit testing of private or package visible classes. There are several reasons for this.

Firstly, any private or package-visible classes must eventually be used by public classes. The unit tests for the public classes can and must be made thorough enough to test all the hidden classes. There is no compelling need to explicitly test hidden classes.

Secondly, the separation of production code from test code is highly desirable from many perspectives, such as readability of the production code, ease of configuration control, and delivery of the production code. Test code for a class can often be an order of magnitude larger than the production code itself, and often contains large sets of configured test data. We therefore like to keep our test classes in explicit test packages so that they do not clutter up the production packages. This clutter can, of course, be removed from our delivered software, but it does make the production classes more difficult to navigate and understand. The separation of the production package from its test package means that the private and package-visible classes cannot be directly accessed by the test packages (though there are workarounds, which we'll look at).

Thirdly, private and package visible classes will normally be tightly coupled to the owning or calling classes in the package. This means that they will be difficult, if not impossible, to test in isolation.

Finally, within a public class, the decision to create private classes or private methods can at times be fairly arbitrary, to aid readability for example. We don't want to restrict the ability to do this sort of refactoring with a requirement that all such restructuring necessitates more testing.

A simple example of this is in our CaseViewer class, which has a Swing Timer for polling the server for cases. We decided to make the action listener for this Timer a private class, for the sake of readability:

private class TimerActionListener implements ActionListener {
public void actionPerformed( ActionEvent ae ) {
//reload the case and interpretation

Other classes that may not need an explicit test are renderers. For example, the table cell renderer used in the CaseViewer.

Private and package visible classes like this do, however, need to be implicitly tested, and this will be done by white-box testing the public classes. That is, we design our tests taking into account the internal structure of the public classes, not just their specification.

We can use code coverage tools to check whether we have really exercised all our hidden code by the explicit unit tests of our public classes. We will look at an example of this later.

If a private or package visible class is too complex to be implicitly tested easily, it might be pragmatic to make it public so that it can be adequately tested. Clearly, this should only be done as a last resort.


Extreme Testing Guideline: It's OK to make a class public if it can't be tested in any other way.

A lot of developers are horrified at the thought of increasing the exposure of a class for any reason. The principal objection is that by exposing a class in our public API, we may be forever committed to maintaining it in its visible state. Whether or not this is a problem in practice will depend on the project being developed. If we are publishing a class library, the objection might be valid. But for most other projects, there really is no reason not to do this. In the situations where it would be really wrong to expose a class, we can write tests that use reflection to call methods and constructors of hidden classes, but this approach has its own problems.

As well as hidden classes, we do not consider it necessary to explicitly test auto-generated classes. Simple Enums fall into this category, having methods values() and valueOf() that are generated by the Java compiler. Enums that define other methods should be tested.


What Methods Need Testing?

Granted that we need to test each public class, what needs to be tested in a class? The specifications of the behavior of a class are the public and protected method specifications, plus any controls used for event-driven input or output.


Extreme Testing Guideline: The unit test for a class must test all public and protected methods and constructors and all user interface components provided by the class.

The reasons are precisely the same for requiring that all public classes be tested.

Included in this guideline are the accessible class constructors. We need to test whether these create an object that is in a consistent and expected state.

A lot of developers are of the opinion that there is no need to explicitly test public "getter" and "setter" methods of a class:

JUnit convention is that there is a test class created for every application class that does not contain GUI code. Usually all paths through each method in a class are tested in a unit test. However, you do not need to test trivial getter and setter methods. (see http://open.ncsu.edu/se/tutorials/junit/.)

Our view is that the existence of paired getters and setters, if all they do is set or expose a variable, is usually an indication of poor design. They expose the internal state of the object as effectively as a publicly visible variable. (For more on this, see JavaWorld article "Why getter and setter methods are evil" by Allen Holub, September 2003.)

So the argument that "getters" and "setters" do not need tests is 'not even wrong', as they say. If there are good reasons for having such a method, then a test is warranted. The implementation could evolve from simply returning a value to returning a more complex calculation, for example. Without a test, bugs could easily be introduced.

As mentioned earlier, a great and perhaps unexpected benefit of the extreme approach to testing is that only the methods that are absolutely necessary ever get written. By forcing ourselves to write unit tests for all methods, we keep ourselves focused on the task at hand—producing lean, but high-quality code. At the end of this chapter, we'll see just how effective this has been at keeping the LabWizard code under control over the last ten years.

So, as a bare minimum, we will need to have a unit test method for each accessible method and constructor in a class. For some classes, this minimal amount of testing is enough, because the class' behavior is immediately implied by the public methods. Other classes have only a few public methods but a lot of code that is called from within a framework such as Swing, or from an internal thread. For example, a typical user interface class, such as the CaseViewer, might have a very long constructor that does a lot of layout and sets up some event handlers, plus one or two simple public methods. For such a class, each unit test method will correspond to either a behavior that needs testing or to an accessible method or constructor.

For example, the CaseViewer class has 20 public methods, but its unit test class has over 60 test methods, a selection of which is shown here:

We will look at this in more detail in Chapter 7. Another example of functionality that needs to be tested, but does not relate directly to a public method, is the serializability of classes declared to implement the marker interface, java.io.Serializable. This issue is discussed in Chapter 14.


What Methods Don't We Test?

For precisely the same reasons as given for the private classes, we do not recommend explicit testing of the private or package visible methods of a class.

The following is an example of a very simple private method in CaseViewer that determines whether the user has changed the interpretive report of the case currently showing.

private boolean interpretationHasBeenChangedInThisSession() {
return kase.differsFromInterp( interpretationText() );

Even though a one-liner, this was usefully implemented as a method to aid readability and to avoid copying that line in several places throughout CaseViewer.

Although not explicitly tested, these methods must be implicitly tested. For classes, there may be times when we have to expose a method (make it public or protected) just so that it can be adequately tested. We may also have to use reflection to change the accessibility of the method on-the-fly, as we describe in the next section. Again, this should be done only as a last resort.

Another situation that may arise is where there are public methods that are "paired", in the sense of performing inverse operations. Some example methods may be:

/** Serializes the object as an XML string */
public String toXML(){…}
/** Construct an object from an XML string */
public Object fromXML(String str){…}

It would make sense for some of the test cases for toXML() to use fromXML(), and in fact could equally well serve as test cases for fromXML(). Rather than just repeat these test cases in both methods, it may be sufficient to implement them just once, say in the tests for toXML(). That is, the method fromXML() would still need to be tested, but some of its test cases may be found in another test method.


Invoking Hidden Methods and Constructors

Java's access modifiers can be overridden using reflection. The key to this is the setAccessible() method of java.lang.reflect.AccessibleObject.

Consider a package with a package-private class called HiddenClass that has a package-private constructor and a package-private method that we want to test:

package jet.reflection_example;
class HiddenClass {
private long createdTime;
HiddenClass() {
createdTime = System.currentTimeMillis();
long createdTime() {
return createdTime;

It is possible to explicitly test this class and have the test in a package separate from the class itself, if we are prepared to do a few tricks with reflection. Here is the test:

package jet.reflection_example.test;
import java.lang.reflect.*;
public class HiddenClassTest {
public boolean runTest() throws Exception {
Class klass = Class.forName( "jet.reflection_example.HiddenClass" );
Object instance = createInstance( klass );
Object value = invoke( klass, instance, "createdTime" );
System.out.println( "m.invoke() = " + value );
return true;

In the first line of runTest(), we find the Class object for HiddenClass using the Class.forName() method. Having found the class, we can create an instance of it using our createInstance() method, which seeks a no-arguments constructor, changes the accessibility of it, and invokes it:

Object createInstance( Class klass ) throws Exception {
Constructor noArgsConstructor = null;
Constructor[] constructors = klass.getDeclaredConstructors();
for (Constructor c : constructors) {
if (c.getParameterTypes().length == 0) {
noArgsConstructor = c;
assert noArgsConstructor != null :
"Could not find no-args constructor in: " + klass;
noArgsConstructor.setAccessible( true );

return noArgsConstructor.newInstance();

The highlighted line makes the constructor usable, even if it is private, or package-private. To invoke the method we want to test, we search for it by name, change its visibility, and invoke it:

Object invoke( Class klass, Object instance,
String methodName ) throws Exception {
Method method = null;
Method[] declaredMethods = klass.getDeclaredMethods();
for (Method m : declaredMethods) {
if (m.getName().equals( methodName )) {
method = m;
assert method != null :
"Could not find method with name '" + methodName + "'";
method.setAccessible( true );
return method.invoke( instance );

This example shows how we can circumvent the Java language access controllers, and run hidden methods in our tests. The pitfalls of working this way are that the code is harder to understand and harder to maintain. Changes to method names or signatures will break the tests at runtime, rather than at compile time. Whether or not we take this approach depends on the project we are working on—is it an absolute requirement that as few classes and methods are exposed as possible, or can we take a more pragmatic approach and expose classes and methods for explicit testing in those rare situations where implicit testing is not adequate?


Unit Test Coverage

Our preferred testing harness, GrandTestAuto, enforces the testing policy we have developed so far in this chapter. This policy requires that all accessible methods and constructors of all accessible classes be explicitly unit-tested. By using this tool, we know that the exposed programming interface to our code is tested.

Another aspect of test coverage is checking that our code is thoroughly exercised by our tests. This can be measured using tools such as Cobertura (see http://cobertura.sourceforge.net) and EMMA (see http://emma.sourceforge.net). Here, we will give an example using EMMA.

The EMMA tool works by installing a custom class loader that instruments the classes we're interested in. The tests are run, and as the JVM exits, a series of HTML reports are written that indicate the level of coverage achieved. For example, running the unit tests for the CaseViewer class only confirmed that the tests had exercised all 94 methods of the class (that is, including private methods) and 98% of the 485 source lines. The following figure shows the Emma coverage report for the CaseViewer unit tests:

A closer look at the report showed which methods had not been covered:

Drilling down into the EMMA report showed us which lines of production code were not being exercised by the tests. The unexercised lines in both the caseRefreshed() and cleanup() methods were, in fact, gaps in our testing.


Who Should Implement the Unit Tests?

It is clear that if we adopt the test-first methodology, the developers of a new class must be the ones initially responsible for the development of the corresponding unit tests. There's simply no choice in this matter. The test class is too intimately connected with the specification and design of the production class to be done by anyone else.

Similarly, once the class and its unit test have been developed, and if changes need to be made for a subsequent release, the test class will be modified by the same developers who modify the production class.


What About Legacy Code?

We've had the experience of having to write unit tests for untested legacy code, long after the original developers have moved on, and we know what a frustrating task this is. An approach we would recommend is to develop the tests in an incremental fashion. That is, develop corresponding unit tests whenever a part of the legacy code needs to be modified. For example, if a method x() of an untested legacy class X needs to be modified, create a test class and methods to thoroughly test x(). But just use "return true" test stubs for the other methods of X that you don't need to modify. We shall see a working example of this later.

In this way, each iteration (in XP parlance, an iteration is a software release cycle) will result in more of the legacy code being adequately tested.


Where Does Integration Testing Fit In?

Using the XP methodology, each iteration provides a fully integrated (but not necessarily fully functional) version of the application. Furthermore, using an Object-Oriented design methodology, the higher level classes will have the role of integrating the lower level classes, normally by delegation or by containment relationships. Hence, the combination of both methodologies ensures that unit testing of the higher level classes will provide our integration testing.

We saw this earlier with the CaseViewer class in LabWizard. In the unit test for this class, we are testing objects that are almost complete client applications. By virtue of the fact that these tests run at all, we are confident of the integrity of our class structure. The Validator is a particularly simple application, but unit tests verify the integration of our very complex applications and even of our server application.


Documentation of Unit Tests

Most unit tests can be adequately documented by the class and method comments in the unit test class itself. The test class naming convention (see Chapter 19) will ensure that there will be no ambiguity about which production class a test class refers to.

However, it may be that a class will require more than one test class. This most recently happened in LabWizard testing when significant new functionality was added to a class called RuleTreeImpl, which had not changed for several years. To effectively test the new method, for removing all rules associated with a certain kind of conclusion, we needed to set up rule trees in a way that was not easy to do within the existing unit tests. So we added a new test class called ReportedConclusionRemovalTest that did just this. In general, a descriptive name for the additional test class, plus comments at the point it is called, should suffice to identify its role.


Testing at the Application Level

The testing approach that we recommend at the application level is, perhaps surprisingly, very similar to the approach for unit-level testing presented so far. However, application-level testing has its own challenges in terms of specification and implementation, and we will deal with some of these issues in the later chapters.

Myers (G. Myers, The Art of Software Testing, Wiley, 1979,Chapter 6.) gives an excellent overview of the categories of higher level testing. They can be summarized as follows:

  • Functional Testing, against the requirements specification.

  • System Testing, against the user documentation, or company standards if not explicitly specified in the requirements specification. The sub-categories of System Testing include:

    • Performance, Stress, and Volume

    • Interface

    • Usability

    • Reliability and Maintainability

    • Security

    • Installation

    • Free play

  • Acceptance Testing, against a contractual specification.

What requirements should be tested? As for unit testing, the only satisfactory answer is:


Extreme Testing Guideline: All application-level requirements need to be explicitly tested.

Each requirement should explicitly reference one or more test cases, normally defined in a test specification. Each test case should be a script that defines the test data to be used, steps providing input to the application (for example, via the user interface or from other software), and the expected outputs as unambiguous pass/fail criteria.

Most importantly, each test case should be implemented as a test class that can be executed without any manual intervention. To do this, we will need high-level "test helper" classes, which provide input to the application via keyboard, mouse, and data interfaces, and conversely, can read outputs from the application such as screen displays. These helper classes will be described in detail in the later chapters. In Chapter 17 we look in detail at some application-level tests for LabWizard.

The LabWizard Validator application now has 60 application-level tests, and that is one of the smaller applications within LabWizard.

In the specification and testing of LabWizard, almost all the automated application-level tests come from the functional specification of the software. For this reason, at Pacific Knowledge Systems, we use the term "function test" for "automated application-level test". We will use the term "function test" in the rest of this book.


Who Should Implement the Function Tests?

The test-first methodology again implies that developers must implement function tests. At this level, requirements are even more ambiguous than at the unit level, mainly because of what a requirement may leave unsaid. A comprehensive set of test cases is needed to rigorously define the requirements of a high-level feature before it is developed.

In an ideal scenario, an independent test team, or the customer, will provide a valuable source of additional test cases that the development team simply would never think of. After all, the test team's primary objective is to break the application. As developers, we often find it difficult to make sincere attempts to find fault with what we have produced, though this is a skill that we must acquire to be fully effective, test-driven, programmers.


Automated Test Execution

In the "bad old days", a unit test, if it existed at all, would be run at the time the unit was developed and included in the build. The test execution would be initiated by the developer, possibly by calling a method in the class that would call some test methods, or draw the component in a test frame that allowed for visual inspection and manual input. The test may never be run again, especially if that component did not change.

Function tests would be run just prior to the planned release of the next version —the last stage of the Waterfall. They would be executed manually against some test specification, normally by junior engineers in the team or dedicated test engineers. The results would be manually recorded.

This type of test execution is the antithesis of the Extreme Testing approach and is likely to result in a low-quality software release. With each new piece of functionality introduced, new bugs will be created, or old bugs will resurface.

How then should tests be executed?


Extreme Testing Guideline: Unit and function tests must be developed within a framework that provides a fully automated execution environment.

Examples of test execution environments are JUnit (http://junit.sourceforge.net) and GrandTestAuto. The latter is our preferred tool, and is the subject of Chapter 19.

There are several compelling reasons for insisting on fully automated tests:

  • Early error detection: The running of the tests simply becomes another stage in the daily automated build procedure, guaranteeing that any error is picked up as soon as possible.

  • Regression: It often happens that an error introduced into a modified class A only manifests itself in an interaction with some other class B. Running all the tests each day, rather than just running the test for the modified class A, for example, maximizes the chance of picking up these more subtle types of errors. That is, automation allows an effective regression test strategy.

  • Consistency: Test results will be far more consistent than if executed manually, no matter how diligent and careful a tester is.

  • Coverage and depth: With an automated methodology, the coverage and depth of the testing that is possible is far greater than anything that can be achieved manually. For example, a huge variety of test cases, often with only subtle differences between them, can be readily implemented so that some feature can be fully tested. This variety would be impractical if the tests were to be run manually.

  • Convenient reporting: With an automated test methodology, test results can be conveniently reported in a summary fashion at any level of detail. In particular, the test report will quickly identify the set of failed tests.

  • A prerequisite for incremental development: The effort involved in running all tests is a limiting factor on the amount of time one can devote to a development iteration. For example, if it takes two weeks to run all the tests, it's impractical to have development iterations on a three or four week schedule—the testing overhead would be far too great. Conversely, if the execution of tests can be done as part of the daily build cycle, development iterations can be very short.

There are of course, two types of application-level testing which can't be fully automated, almost by definition.

With free-play testing we need to execute unplanned test cases, so clearly, these test cases can't be automated, at least not the first time they are run. If some free-play testing does find an error then it makes sense to include this scenario as an automated test case for subsequent execution.

Similarly, whilst a large part of usability testing may be automated, for example, checking the tab order of components in a dialog, or checking that each item in a menu has a mnemonic, there will be elements of usability testing that can be done only by a human tester.


A Hierarchy of Tests

We only release software once all the tests for it have passed. In this sense, it does not matter in what order the tests are run. In reality though, we want to find errors as quickly as possible, both in our continuous build process, and in our final release testing.

As an example, here is how we organize the tests for the LabWizard product suite. This is not to be interpreted as the best way of doing things for all projects. It is simply an example of what has worked for us.

We organize the testing into three stages:

First are the unit tests. These cover every single class, including those complex ones that draw together lots of other classes to create significant functional units, even stand-alone products. As mentioned earlier, the very nature of object-oriented systems means that complete unit-testing achieves a great deal in the way of integration testing. The times taken by the unit tests vary from a few seconds to tens of minutes, depending on the complexity of the classes that they test. Unit tests for user interface classes tend to be slower than those for non-UI classes of similar complexity. This is due to the relative slowness of drawing components on-screen, and also because of pauses between keystrokes in these tests. Overall, the unit tests take a couple of hours to run on a very fast eight core machine with 16 gigabytes of memory.

Next are the function tests. Each functional requirement in our software specification leads to a test script in our testing specification. The function tests are Java classes that implement these test scripts. Each test takes about a minute to run, and there are about four hundred of them. The overall time to test these is about three and a half hours.

Third in line are the load tests. These stress and scalability tests take a long time to run—about two hours for the fifty-odd tests we have.

Sometimes it's not a clear-cut decision at which level to put a test. For example, our main server component, RippledownServerImpl, previously experienced problems that could only be replicated by a test that took more than an hour. In theory, this test should be at the unit level, since it is detecting a specific bug in a single class, and was originally written as such. However, as we need to be able to run our unit tests in a reasonable amount of time, we have refactored it as a load test.


What Language Should Our Tests Be In?

All tests should be in the primary development language of the project, namely Java, for the kinds of projects we are concerned with in this book. It is quite common to write tests in Python or some other scripting language, but we have found this to be unsatisfactory for a number of reasons.

For a start, our test code needs to be at least as good as our production code. It is easy for tests in a scripting language to be dismissed as mere 'test scripts'. If Python (say) is good enough for our tests, why is it not good enough for our production code?

Next, if we're writing a Java project, we can be sure that all team members are, or are becoming, proficient with Java. To require all developers to be as good at a second language as they are at Java is an unnecessary requirement.

Lastly, and perhaps most importantly, our test code needs to be maintained alongside our production code. For example, we need to have the confidence to be able to rename a method, or change its signature in our production code without breaking any test code. That is, we need to apply to our test code the same productivity and refactoring tools that are available in our primary language.


Is it Really Possible?

This talk about testing everything is all very well, but just how practical is it? Can we really write automated tests for our user interface components and other areas traditionally regarded as being too hard to test? Do we really have time to write all these unit and function tests? Given a pre-existing, largely untested codebase, isn't it so hard to move towards a fully-tested system, that it's not even worth trying?

A large part of this book is devoted to showing that user interfaces really can be tested, and that this is not so hard to do. We have done it with the LabWizard product suite and the Ikon Do It application that comes with the source code to this book. These have lots of automated user interface tests, proving that it can be done.

In answer to the question "Do we have time to write all these (expletive deleted) tests?", our experience is that we don't have time not to. It is great being able to sit with a customer, showing them new functionality, knowing that the new features have been thoroughly tested and will just work. Similarly, it's great when your customer has the expectation that each upgrade of your mission critical software at their site will be a painless process—in stark contrast to their experience of other software products! Having experienced this, one could never go back to a less "extreme" test methodology.

When faced with a pre-existing codebase that is poorly tested, we can feel overwhelmed by the amount of work to be done. In this situation, we need to remind ourselves that every single test we write is improving the situation, whereas any bit of untested code we add is making it worse. To be sure, getting to the 'Nirvana' of everything being tested can be a long hard journey. It certainly was for us with LabWizard:

Over the period of this graph, about 60 releases of the software were made. Each release added many new features and bug fixes to the product suite. Yet because of the continual refactoring that is possible with a well-tested codebase, the total number of production classes has actually dropped, whilst the number of test classes has increased five-fold.

So yes, it is possible to follow the Extreme Testing approach, and it is so worthwhile that it just has to be done.



In this chapter, we have seen that all public classes need unit tests, and these tests should be written ahead of the classes they test, as far as possible. Each functional requirement in our software specification should correspond to a test class. All tests should be written in the main development language for our project and should be run as a part of our continuous build and integration process.

About the Authors

  • Lindsay Peters

    Lindsay Peters is the Chief Technical Officer for Pacific Knowledge Systems. He an experience of 25 years in software management, formal analysis, algorithm development, software design, and implementation for large commercial and defense systems. Ten years ago, Lindsay and his team were the early adopters of Java, coupled with more rigorous design processes such as Design by Contract. He then helped transition the development team to the Extreme Programming model. Out of this exciting and successful experience grew the "Extreme Testing" approach. In the early 80's, Lindsay managed a software team that was one of the first to incorporate the newly discovered simulated annealing algorithm into a commercial application. This team solved a previously intractable real-world problem, which was the optimum assignment of radio frequencies to collocated mobile radios. Apart from software development and artificial intelligence systems, Lindsay has an interest in mathematical convexity, and has helped to progress the "Happy Ending" problem. He is also involved in politics, and in the last Australian Federal election he stood as the Greens candidate for the seat of Bennelong.

    Contact Lindsay Peters

    Browse publications by this author
  • Tim Lavers

    Tim Lavers is a Senior Software Engineer at Pacific Knowledge Systems, which produces LabWizard—the gold standard for rules-based knowledge acquisition software. In developing and maintaining LabWizard for almost 10 years, Tim has worked with many Java technologies, including network programming, Swing, reflection, logging, JavaHelp, web services, RMI, WebStart, preferences, internationalization, concurrent programming, XML, and databases. He has worked with tools as well, such as Ant and CruiseControl. His job also includes a healthy mix of user training, technical support, and support to marketing. In his previous job, he wrote servlets and built an image processing library. Along with his professional programming, he writes and maintains the distributed testing tool, GrandTestAuto. He has published a JavaWorld article on RMI as well as a number of mathematical papers. Tim's hobbies include running and playing the piano.

    Contact Tim Lavers

    Browse publications by this author
Swing Extreme Testing
Unlock this book and the full library FREE for 7 days
Start now