Mastering Software Testing with JUnit 5

3.7 (3 reviews total)
By Boni García
    What do you get with a Packt Subscription?

  • Instant access to this title and 7,500+ eBooks & Videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
About this book

When building an application it is of utmost importance to have clean code, a productive environment and efficient systems in place. Having automated unit testing in place helps developers to achieve these goals. The JUnit testing framework is a popular choice among Java developers and has recently released a major version update with JUnit 5.

This book shows you how to make use of the power of JUnit 5 to write better software.

The book begins with an introduction to software quality and software testing. After that, you will see an in-depth analysis of all the features of Jupiter, the new programming and extension model provided by JUnit 5. You will learn how to integrate JUnit 5 with other frameworks such as Mockito, Spring, Selenium, Cucumber, and Docker.

After the technical features of JUnit 5, the final part of this book will train you for the daily work of a software tester. You will learn best practices for writing meaningful tests. Finally, you will learn how software testing fits into the overall software development process, and sits alongside continuous integration, defect tracking, and test reporting.

Publication date:
October 2017


Retrospective On Software Quality And Java Testing

In order to make an apple pie from scratch, you must first invent the universe.
- Carl Sagan

The well-known testing framework JUnit has come a long way since its inception in 1995. On September 10, 2017, an important milestone in the project life cycle took place, i.e. the release of JUnit 5.0.0. Before going deep into the details of JUnit 5, it is worth reviewing the status quo of software testing, in order to understand from where we have come, and where we are going. To that aim, this chapter provides a high-level review of the background of software quality, software testing, and testing for Java. Concretely, the chapter is composed of three sections:

  • Software quality: The first section reviews the status quo in quality engineering: Quality assurance, ISO/IEC-2500, Verification & Validation (V&V), and software defects (bugs).
  • Software testing: This is the most commonly performed activity to guarantee software quality and reduce the number of software defects. This section provides a theoretical background of software testing levels (unit, integration, system, and acceptance), methods (black-box, white-box, and non-functional),  automated and manual software testing.
  • Testing frameworks for the Java Virtual Machine (JVM): This section provides a summary of the main features of the legacy versions of the JUnit framework (that is, versions 3 and 4). Finally, a brief description of alternative testing frameworks and enhancers to JUnit is depicted.

Software quality

Software is the collection of computer programs, related data, and associated documentation developed for a particular customer or for a general market. It is an essential part of the modern world, and it has become pervasive in telecommunications, utilities, commerce, culture, entertainment, and so on. The question What is software quality? can generate different answers, depending on the involved practitioner's role in a software system. There are two main groups of people involved in a software product or service:

  • Consumers: are people who use software. In this group, we can differentiate between customers (that is, people responsible for the acquisition of software products or services) and users (that is, people who use the software products or services for various purposes). Nevertheless, the dual roles of customers and users are quite common.
  • Producers: are people involved with the development, management, maintenance, marketing, and service of software products.

The quality expectations of consumers are that a software system performs useful functions as specified. For software producers, the fundamental quality question is fulfilling their contractual obligations by producing software products that conform to the Service Level Agreement (SLA). The definition of software quality by the well-known software engineer Roger Pressman comprises both points of view:

An effective software process applied in a manner that creates a useful product that provides measurable value for those who produce it and those who use it.

Quality engineering

Quality engineering (also known as quality management) is a process that evaluates, assesses, and improves the quality of software. There are three major groups of activities in the quality engineering process:

  1. Quality planning: This stage establishes the overall quality goal by managing customer's expectations under the project cost and budgetary constraints. This quality plan also includes the strategy, that is, the selection of activities to perform and the appropriate quality measurements to provide feedback and assessment.
  1. Quality Assurance (QA): This guarantees that software products and processes in the project life cycle meet their specified requirements by planning and performing a set of activities to provide adequate confidence that quality is being built into the software. The main QA activity is Verification & Validation, but there are others, such as software quality metrics, the use of quality standards, configuration management, documentation management, or an expert's opinion.
  2. Post-QA: These stage includes activities for quality quantification and improvement measurement, analysis, feedback, and follow-up activities. The aim of these activities is to provide quantitative assessment of product quality and identification of improvement opportunities.

These phases are represented in the following chart:

Software Quality Engineering Process

Requirements and specification

Requirements are a key topic in the quality engineering domain. A requirement is a statement identifying a capability, physical characteristic, or quality factor that bounds a product or process need for which a solution will be pursued. The requirement development (also known as requirements engineering) is the process of producing and analyzing customer, product, and product-component requirements. The set of procedures that support the development of requirements, including planning, traceability, impact analysis, change management, and so on, is known as requirements management. There are two kinds of software requirements:

  • Functional requirements are actions that the product must do to be useful to its users. They arise from the work that stakeholders need to do. Almost any action such as, inspecting, publishing, or most other active verbs can be a functional requirement.
  • Non-functional requirements are properties, or qualities, that the product must have. For example, they can describe properties such as performance, usability, or security. They are often called quality attributes.

Another important topic strongly linked with the requirements is the specification, which is a document that specifies in a complete, precise, verifiable manner, the requirements, design, behavior, or other characteristics of a system, and often the procedures for determining whether these provisions have been satisfied.

Quality Assurance

Quality Assurance (QA) is primarily concerned with defining or selecting standards that should be applied to the software development process or software product. Daniel Galin, the author of the book Software Quality Assurance (2004) defined QA as:

Systematic, planned set of actions necessary to provide adequate confidence that the software development and maintenance process of a software system product conforms to established specification as well as with the managerial requirements of keeping the schedule and operating within the budgetary confines.

The QA process selects the V&V activities, tools, and methods to support the selected quality standards. V&V is a set of activities carried out with the main objective of withholding products from shipment if they do not qualify. In contrast, QA is meant to minimize the costs of quality by introducing a variety of activities throughout the development and maintenance process in order to prevent the causes of errors, detect them, and correct them in the early stages of development. As a result, QA substantially reduces the rates of non-qualifying products. All in all, V&V activities are only a part of the total range of QA activities.


Various quality standards have been proposed to accommodate these different quality views and expectations. The standard ISO/IEC-9126 was one of the most influential in the software engineering community. Nevertheless, researchers and practitioners detected several problems and weaknesses in this standard. For that reason, the ISO/IEC-9126 international standard is superseded by the ISO/IEC-25000 series of international standards on Software product Quality Requirements and Evaluation (SQuaRE). This section provides a high-level overview of this standard.

The ISO/IEC-2500 quality reference model distinguishes different views on software quality:

  • Internal quality: This concerns the properties of the system, that can be measured without executing it.
  • External quality: This concerns the properties of the system, that can be observed during its execution.
  • Quality in use: This concerns the properties experienced by its consumer during operation and maintenance of the system.

Ideally, the development (process quality) influences the internal quality; then, the internal quality determines the external quality. Finally, external quality determines quality in use. This chain is depicted in the following picture:

ISO/IEC-2500 Product Quality Reference Model

The quality model of ISO/IEC-25000 divides the product quality model (that is, the internal and external attributes) into eight top-level quality features: functional suitability, performance efficiency, compatibility, usability, reliability, security, maintainability, and portability. The following definitions have been extracted directly from the standard:

  • Functional suitability: This represents the degree to which a product or system provides functions that meet stated and implied needs when used under specified conditions.
  • Performance efficiency: This represents the performance relative to the amount of resources used under stated conditions.
  • Compatibility: This is the degree to which a product, system or component can exchange information with other products, systems or components, and/or perform its required functions, while sharing the same hardware or software environment.
  • Usability: This is the degree to which a product or system can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use.
  • Reliability: This is the degree to which a system, product, or component performs specified functions under specified conditions for a specified period of time.
  • Security: This is the degree to which a product or system protects information and data so that persons or other products or systems have the degree of data access appropriate to their types and levels of authorization
  • Maintainability: This represents the degree of effectiveness and efficiency with which a product or system can be modified to improve it, correct it, or adapt it to changes in environment and in requirements
  • Portability: This is the degree of effectiveness and efficiency with which a system, product, or component can be transferred from one hardware, software, or other operational or usage environment to another

On the other hand, the attributes of quality in use can be categorized into the following five characteristics:

  • Effectiveness: This is the accuracy and completeness with which users achieve specified goals.
  • Efficiency: These are the resources expended in relation to the accuracy and completeness with which users achieve goals.
  • Satisfaction: This is the degree to which user needs are satisfied when a product or system is used in a specified context of use.
  • Freedom from risk: This is the degree to which a product or system mitigates the potential risk to economic status, human life, health, or the environment.
  • Context coverage: This is the degree to which a product or system can be used with effectiveness, efficiency, freedom from risk, and satisfaction in both specified contexts of use and in contexts beyond those initially explicitly identified.

Verification and Validation

Verification and Validation -also known as Software Quality Control- is concerned with evaluating that the software being developed meets its specifications and delivers the functionality expected by the consumers. These checking processes start as soon as requirements become available, and continue through all stages of the development process. Verification is different to validation, although they are often confused.

The distinguished professor of computer science Barry Boehm expressed the difference between them back in 1979:

  • Verification: are we building the product right? The aim of verification is to check that the software meets its stated functional and non-functional requirements (that is, the specification).
  • Validation: are we building the right product? The aim of validation is to ensure that the software meets consumer's expectations. It is a more general process than verification, due to the fact that specifications do not always reflect the real wishes or needs of consumers.

V&V activities include a wide array of QA activities. Although software testing plays an extremely important role in V&V, other activities are also necessary. Within the V&V process, two big groups of techniques of system checking and analysis may be used:

  • Software testing: This is the most commonly performed activity within QA. Given a piece of code, software testing (or simply testing) consists of observing a sample of executions (test cases), and giving a verdict on them. Hence, testing is an execution-based QA activity, so a prerequisite is the existence of the implemented software units, components, or system to be tested. For this reason, it is sometimes called dynamic analysis.
  • Static analysis: This is a form of V&V that does not require execution of the software. Static analysis works on a source representation of the software: either a model of the specification of design or the source or the program. Perhaps, the most commonly used are inspections and reviews, where a specification, design, or program is checked by a group of people. Additional static analysis techniques may be used, such as automated software analysis (the source code of a program is checked for patterns that are known to be potentially erroneous).

It should be noted that there is a strong divergence of opinion about what types of testing constitute validation or verification. Some authors believe that all testing is verification and that validation is conducted when requirements are reviewed and approved. Other authors view unit and integration testing as verification and higher-order testing (for example, system or user testing) as validation. To solve this divergence, V&V can be treated as a single topic rather than as two separate topics.

Software defects

Key to the correctness aspect of V&V is the concept of software defects. The term defect (also known as bug) refers to a generic software problem. The IEEE Standard 610.12 propose the following taxonomy related to software defects:

  • Error: A human action that produces an incorrect result. Errors can be classified into two categories:
    1. Syntax error (program statement that violates one or more rules of the language in which it is written).
    2. Logic error (incorrect data fields, out-of-range terms, or invalid combinations).
  • Fault: The manifestation of an error in the software system is known as a fault. For example, an incorrect step, process, or data definition.
  • Failure: The inability of the software system to perform its required functions is known as (system) failure.
The term bug was first coined in 1946 by the software pioneer Grace Hooper, when a moth trapped in rely of an electromechanical computer caused a system malfunction. In this decade, the term debug was also introduced, as the process of detecting and correcting defects in a system.

In addition to this level of granularity for defects, it is also interesting to contemplate incidences as symptoms associated with a failure perceived by the software consumer. All in all, error, faults, failures, and incidences are different aspects of software defects. A causal relation exists between these four aspects of defects. Errors may cause faults to be injected into the software, and faults may cause failures when the software is executed. Finally, incidences happen when failures are experienced by the final user or costumer. Different QA activities can be carried out to try to minimize the number of defects within a software system. As defined by Jeff Tian in his book Software Quality Engineering (2005), the alternatives can be grouped into the following three generic categories:

  • Defect prevention through error removal: For example, the use of certain processes and product standards can help to minimize the injection certain kinds of faults into the software.
  • Defect reduction through fault detection and removal: The traditional testing and static analysis activities are examples of this category. We discover the specific types of these mechanisms in the body of this chapter.
  • Defect containment through failure prevention: These activities are typically out of the scope of the software system. The objective of containment is to minimize the damage caused by software system failures (for example, walls to contain radioactive material in case of reactor failures).
Software defect chain and associated QA activities

Static analysis

Static analysis of a software piece is performed without executing the code. There are several advantages to software analysis over testing:

  1. During testing, errors can hide other errors. This situation does not happen with static analysis, because it is not concerned with interactions between errors.
  2. Incomplete versions of a system can be statically analyzed without additional cost. In testing, if a program is incomplete, test harnesses have to be developed.
  3. Static analysis can consider broader quality attributes of a software system, such as compliance with standards, portability, and maintainability.

There are different methods that can be identified as static analysis:

  • Inspection (first proposed by Michael Fagan in 1976) are examinations of software artifacts by human inspectors aimed at discovering and fixing faults in the software systems. All kinds of software assets are subject to be inspected, for example the specification, design models, and so on. The primary reason for the existence of inspection is not waiting for the availability of executable programs (such as in testing) before starting performing inspection.
  • Review is the process in which a group of people examine the software and its associated documentation, looking for potential problems and non-conformance with standards, and other potential problems or omissions. Nowadays, reviews are frequently carried out for new code before being merged in a shared source code repository. Typically, the review is done by a different person to the code author within the same team (peer review). This process is quite expensive in terms of time and effort, but on the other side, when correctly performed, it helps to ensure a high internal code quality reducing potential risks.
A walkthrough is a special form of review. According to IEEE Standard for Software Reviews, a walkthrough is a form of software peer review in which a designer or programmer leads members of the development team and other interested parties through a software product, and the participants ask questions and make comments about possible errors, violation of development standards, and other problems.
  • Automated software analysis assesses the source code using patterns that are known to be potentially dangerous. This technique is usually delivered as commercial or open source tools and services, commonly known as lint or linter. These tools can locate many common programming faults, analyze the source code before it is tested, and identify potential problems in order to re-code them before they manifest themselves as failures. The intention of this linting process is to draw a code reader’s attention to faults in the program, such as:
    1. Data faults: This may include variables declared but never used, variables assigned twice but never used between assignments, and so on.
    2. Control faults: This may include unreachable code or unconditional branches into loops.
    3. Input/output faults: This may include variables output twice with no intervening assignment.
    4. Interface faults: This may include parameter-type mismatches, parameter under mismatches, non-usage of the results of functions, uncalled functions and procedures, and so on.
    5. Storage management faults: This may include unassigned pointers, pointers arithmetic, and so on.

Halfway between static analysis and dynamic testing we find an especial way of software evaluation, called formal verification. This kind of assessment provides mechanisms to check that a system operates according to its formal specification. To that aim, software is treated as a mathematical entity whose correctness can be proved using logical operations, combining different types of static and dynamic evaluation. Nowadays, formal methods are not widely adopted mainly due to scalability problems. Projects using these techniques are mostly relatively small, such as critical kernel systems. As systems grow, the effort required to develop a formal specification and verification grow excessively.


Software testing

Software testing consists of the dynamic evaluation of the behavior of a program on a finite set of test cases, suitably selected from the usually infinite executions domain, against the expected behavior. The key concepts of this definition are depicted as follows:

  • Dynamic: The System Under Test (SUT) is executed with specific input values to find failures in its behavior. Thus, the actual SUT should ensure that the design and code are correct, and also the environment, such as the libraries, the operating system and network support, and so on.
  • Finite: Exhaustive testing is not possible or practical for most real programs. They usually have a large number of allowable inputs to each operation, plus even more invalid or unexpected inputs and the possible sequences of operations are usually infinite as well. Testers must choose a number of tests so that we can run the tests in the available time.
  • Selected: Since there is a huge or infinite set of possible tests and we can can afford to run only a small fraction of them, the key challenge of testing is how to select the tests that are most likely to expose failures in the system.
  • Expected: After each test execution, it must be decided whether the observed behavior of the system was a failure or not.

Software testing is a broad term encompassing a wide spectrum of different concepts. There is no universal classification for all the different testing forms available in the literature. For the shake of clarity, in this book we classify the different form of tests using three axis, namely testing level (unit, integration, system, and acceptance), testing methods (black-box, white-box, and non-functional testing), and testing types (manual and automated).

Next sections provide more details about all of these concepts, which are summarized in the following diagram:

Taxonomy of software testing in three categories: levels, methods, and types

For example, as we will discover, a JUnit test that exercises a method in a class according to its functional behaviour can be seen as an automated unit black-box test. When a final consumer uses a software product to validate if works as expected, according the taxonomy before we can see this as a manual black-box acceptance test. It should be noticed than not all possible combination of these three axes is always meaningful. For instance, non-functional tests (example, performance) is typically carried out automatically and at system levels (it would be very unlikely to do manually or at unit level).

Testing levels

Depending on the size of the SUT and the scenario in which it is exercised, testing can be carried out at different levels. In this book, we classify the different testing levels in four phases:

  • Unit testing: Here, individual program units are tested. Unit testing should focus on the functionality of objects or methods.
  • Integration testing: Here, units are combined to create composite components. Integration testing should focus on testing components, interfaces.
  • System testing: Here, all of the components are integrated and the system is tested as a whole.
  • Acceptance testing: Here, consumers decide whether or not the system is ready to be deployed in the consumer environment. It can be seen as a high-level functional testing performed at system level by final users or customers.
There is no universal classification in the many different forms of testing. Regarding testing levels, in this book, we use the aforementioned classification of four levels. Nevertheless, other levels or approaches are present in the literature (for example, system integration testing or regression testing). In the last part of this section, we can find a review of different testing approaches.

The first three levels (unit, integration, and system) are typically carried out during the development phases of the software life cycle. These tests are typically performed by different roles of software engineers (that is, programmers, testers, QA team, and so on). The objective of these tests is the verification of the system. On the other side, the fourth level (acceptance) is a type of user testing, in which potential or real users are usually involved (validation). The following picture provides a graphical description of these concepts:

Testing levels and its relationship with V&V

Unit testing

Unit testing is a method by which individual pieces of source code are tested to verify that the design and implementation for that unit have been correctly implemented. There are four phases executed in sequence in a unit test case are the following:

  • Setup: The test case initializes the test fixture, that is the before picture required for the SUT to exhibit the expected behavior.
  • Exercise: The test case interacts with the SUT, getting some outcome from it as a result. The SUT usually queries another component, named the Depended-On Component (DOC).
  • Verify: The test case determines whether the expected outcome has been obtained using assertions (also known as predicates).
  • Teardown: The test case tears down the test fixture to put the SUT back into the initial state.

These phases and its relationship with the SUT and DOC is illustrated as follows:

Unit test generic structure

Unit testing is done with the unit under test in isolation, that is, without interacting its DOCs. To that aim, test doubles are employed to replace any components on which the SUT depends. There are several kinds of test doubles:

  • A dummy object simply satisfies the real object API but it is never actually used. The typical use case for dummy objects is when they are passed as parameters to meet the method signature, but then the dummy object is not actually used.
  • A fake object replaces the real object with a simpler implementation, for example, an in-memory database.
  • A stub object replaces the real object providing hard-coded values as responses.
  • A mock object also replaces the real object, but this time with programmed expectations as responses.
  • A spy object is a partial mock object, meaning that some of its methods are programmed with expectations, but the others use the real object's implementation.

Integration testing

Integration testing should expose defects in the interfaces, and the interaction between integrated components or modules. There are different strategies for performing integration testing. These strategies describe the order in which units are to be integrated, presuming that the units have been separately tested. Examples of common integration strategies are the following:

  • Top-down integration: This strategy starts with the main unit (module), that is, the root of the procedural tree. Any lower-level module that is called by the main unit should be substituted by a test double. Once testers are convinced that the main unit logic is correct, the stubs are gradually replaced with the actual code. This process is repeated for the rest of the lower-unit in the procedural tree. The main advantage of this approach is that defects are more easily found.
  • Bottom-up integration: This strategy starts the testing process with the most elementary units. Larger subsystems are assembled from the tested components. The main advantage of this type is that test doubles are not needed.
  • Ad hoc integration: The components are integrated in the natural order in which are finished. It allows an early testing of the system. Test doubles are usually required.
  • Backbone integration: A skeleton of components is built and others are gradually integrated. The main disadvantage of this approach is the creation of the backbone, which can be labor-intensive.
Another strategy commonly referred in the literature is big-bang integration. In this strategy, testers wait until all or most of the units are developed e integrated. As a result, all the failures are found at the same time, making very difficult and time-consuming to correct the underlying faults. If possible, this strategy should be avoided.

System testing

System testing during development involves integrating components to create a version of the system and the testing the integrated system. It verifies that the components are compatible, interacts correctly, and transfer the right data at the right time, topically across its user interfaces. It obviously overlaps with integration testing, but the difference here is that system testing should involve all the system components together with the final user (typically impersonated).

There is an special type of system testing called end-to-end testing. In this approach, the final user is typically impersonated, that is, simulated using automation techniques.

Testing methods

Testing methods (or strategies) define the way for designing test cases. They can be responsibility based (black-box), implementation based (white box), or non-functional. Black-box techniques design test cases on the basis of the specified functionality of the item to be tested. White-box ones rely on source code analysis to develop test cases. Hybrid techniques (grey-box) testing designs test cases using both responsibility-based and implementation-based approaches.

Black-box testing

Black-box testing (also known as functional or behavioral testing) is based on requirements with no knowledge of the internal program structure or data. Black-box testing relies on the specification of the system or the component that is being tested to derive test cases. The system is a black-box whose behavior can only be determined by studying its inputs and the related outputs. There are a lot of specific black-box testing techniques; some of the most well-known ones are described as follows:

  • Systematic testing: This refers to a complete testing approach in which SUT is shown to conform exhaustively to a specification, up to the testing assumptions. It generates test cases only in the limiting sense that each domain point is a singleton sub-domain. Inside this category, some of the most commonly performed are equivalence partitioning and boundary value analysis, and also logic-based techniques, such as cause-effect graphing, decision table, or pairwise testing.
  • Random testing: This is literally the antithesis of systematic testing -the sampling is over the entire input domain-. Fuzz testing is a form of black-box random testing, which randomly mutates well-formed inputs and tests the program on the resulting data. It delivers randomly sequenced and/or structurally bad data to a system to see if failures occur.
  • Graphic User Interface (GUI) testing: This is the process of ensuring the specification of software with a graphic interface interacting with the user. GUI testing is event-driven (for example, mouse movements or menu selections) and provides a frontend to the underlying application code through messages or method calls. GUI testing at unit level is used typically at the button level. GUI testing at system level exercises the event-driven nature of the SUT.
  • Model-based testing (MBT): This is a testing strategy in which test cases are derived in part from a model that describes some (if not all) aspects of the SUT. MBT is a form of black-box testing because tests are generated from a model, which is derived from the requirements documentation. It can be done at different levels (unit, integration, or system).
  • Smoke testing: This is the process of ensuring the critical functionality of the SUT. A smoke test case is the first to be run by testers before accepting a build for further testing. Failure of a smoke test case will mean that the software build is refused. The name of smoke testing derives electrical system testing, whereby the first test was to switch on and see if it smoked.
  • Sanity testing: This is the process of ensuring the basic functionality of the SUT. Similarly to smoke testing, sanity tests are performed at the beginning of the test process, but its objective is different. Sanity tests are supposed to ensure that the SUT basic features continue working as expected (i.e. the rationality of the SUT), before conducting more exhaustive tests.
Smoke and sanity testing are usually confusing terms in the software testing community. It is commonly accepted that both kind of tests are performed to avoid wasting effort in rigorous testing when these tests fail, being the main difference their target (critical vs. basic functionality).

White-box testing

White-box testing (also known as structural testing) is based on knowledge of the internal logic of an application's code. It determines if the program-code structure and logic is faulty. White-box test cases are accurate only if the tester knows what the program is supposed to do.

Black-box testing uses only the specification to identify use cases, while white-box testing uses the program source code (implementation) as the basis of test case identification. Both approaches, used in conjunction, should be necessary in order to select a good set of test cases for the SUT. Some of the most significant white-box techniques are as follows:

  • Code coverage defines the degree of source code, which has been tested, for example, in terms of percentage of LOCs. There are several criteria for the code coverage:
    1. Statement coverage: The line of code coverage granularity.
    2. Decision (branch) coverage: Control structure (for example, if-else) coverage granularity.
    3. Condition coverage: Boolean expression (true-false) coverage granularity.
    4. Paths coverage: Every possible route coverage granularity.
    5. Function coverage: Program functions coverage granularity.
    6. Entry/exit coverage: Call and return of the coverage granularity.
  • Fault injection is the process of injecting faults into software to determine how well (or badly) some SUT behaves. Defects can be said to propagate, and in that case, their effects are visible in program states beyond the state in which the error existed (a fault became a failure).
  • Mutation testing validates tests and their data by running them against many copies of the SUT containing different, single, and deliberately inserted changes. Mutation testing helps to identify omissions in the code.

Non-functional testing

The non-functional aspects of a system can require considerable effort to test. Within this group it can be found different means of testing, for example, performance testing conducted to evaluate the compliance of a SUT with specified performance requirements. These requirements usually include constraints about the time behavior and resource usage. Performance testing may measure response time with a single user exercising the system or with multiple users exercising the system. Load testing is focused on increasing the load on the system to some stated or implied maximum load, to verify the system can handle the defined system boundaries. Volume testing is often considered synonymous with load testing, yet volume testing focuses on data. Stress testing exercises beyond normal operational capacity to the extent that the system fails, identifying actual boundaries at which the system breaks. The aim of stress testing is to observe how the system fails and where the bottlenecks are.

Security testing tries to ensure the following concepts: confidentiality (protection against the disclosure of information), integrity (ensuring the correctness of the information), authentication (ensuring the identity of the user), authorization (determining that a user is allowed to receive a service or perform an operation), availability (ensuring that the system performs its functionality when required), and non-repudiation (ensuring the denial that an action happened). Authorized attempts for evaluating the security of system infrastructure is often known as penetration testing.

Usability testing focuses on finding user interface problems, which may make the software difficult to use or may cause users to misinterpret the output. Accessibility testing is the technique of making sure that our product is accessibility (the ability to access the system functionality) compliant.

Testing types

There are two main types to carrying out software testing:

  • Manual testing: This is the process of assessing the SUT is done by a human, typically a software engineer or the final consumer. In this type of testing, we can find the so-called exploratory testing, which is a type of manual testing in which human testers evaluate the system by investigating and freely evaluating the system using its personal perception.
  • Automated testing: This is the process of assessing the SUT in which the testing process (test execution, reporting, and so on) is carried out with special software and infrastructure for testing. Elfriede Dustin, in her book Implementing Automated Software Testing: How to Save Time and Lower Costs While Raising Quality (2009), defined Automated Software Testing (AST) as the:
Application and implementation of software technology throughout the entire software testing life cycle with the goal to improve efficiencies and effectiveness.

The main benefits of AST are: anticipated cost savings, shortened test duration, heightened thoroughness of the tests performed, improvement of test accuracy, improvement of result reporting as well as statistical processing, and subsequent reporting.

Automated tests are typically executed in build servers in the context of Continuous Integration (CI) processes. More details about this are provided in chapter 7, Testing Management.

AST is most effective when implemented within a framework. Testing frameworks may be defined as a set of abstract concepts, processes, procedures and environments in which automated tests will be designed, created, and implemented. This framework definition includes the physical structures used for test creation and implementation, as well as the logical interactions among those components.

Strictly speaking, that definition of framework is not very far from what we can understand by library. In order to make the difference clearer, consider the following quote from the well-known software engineering guru Martin Folwer:

A library is essentially a set of functions that you can call, these days usually organized into classes. Each call does some work and returns control to the client. A framework embodies some abstract design, with more behavior built in. In order to use it you need to insert your behavior into various places in the framework either by subclassing or by plugging in your own classes. The framework's code then calls your code at these points.
Visual explanation of the difference between library and framework

Frameworks are becoming more and more important in modern software development. They provide a capability highly desired in software-intensive systems: reusability. This way, large applications will end up consisting of layers of frameworks that cooperate with each other.

Other testing approaches

As introduced at the beginning of this section, there is no an universal definition for the different forms of testing. In this section we review some of the most commonly varieties of testing available in the literature not covered so far. For instance, when the testing process is performed to determine whether the system meets its specifications, it is known as conformance testing. When a new feature or functionality is introduced to a system (we can call it a build), the way of testing this new feature in known as progression testing. In addition to that, to check that the new introduced changes do not affect the correctness of the rest of the system, the existing test cases are exercised. This approach is commonly known as regression testing.

When the system interacts with any external or third-party system, another testing could be done, known as system integration testing. This kind of testing verifies that the system is integrated to any external systems properly.

User or customer testing is a stage in the testing process in which users or customers provide input and advice for system testing. Acceptance testing is a type of user testing, but there can also be different types of user testing:

  • Alpha testing: This takes place at developers' sites, working together with the software's consumers, before it is released to external users or customers.
  • Beta testing: This takes place at customer's sites and involves testing by a group of customers who use the system at their own locations and provide feedback, before the system is released to other customers.
  • Operational testing: This is performed by the end user in its normal operating environment.

Finally, release testing refers to the process of testing a particular release of a system performed by a separate team outside the development team. The primary goal of the release testing process is to convince the supplier of the system that is good enough for use.


Testing frameworks for the JVM

JUnit is a testing framework which allows to create automated tests. The development of JUnit was started by Kent Beck and Erich Gamma in late 1995. Since then, the popularity of the framework has been growing. Nowadays, it is broadly considered as the de facto standard for testing Java applications.

JUnit was designed to be a unit-testing framework. Nevertheless, it can be used to implement not just unit tests, but also other kinds of tests. As we will discover in the body of this book, depending on how the test logic exercises the piece of software under test, a test case implemented with JUnit can be considered as an unit, integration, system, and even acceptance test. All in all, we can think of JUnit as a multi-purpose testing framework for Java.

JUnit 3

Since the early versions of JUnit 3, the framework can work with Java 2 and higher. JUnit3 is open source software, released under Common Public License (CPL) Version 1.0 and hosted on SourceForge ( The latest version of JUnit 3 was JUnit 3.8.2, released on May 14, 2007. The main requirements introduced by JUnit in the world of testing frameworks were the following:

  1. It should be easy to define which tests will run.
  2. The framework should be able to run tests independently of all other tests.
  3. The framework should detect and report errors test by test.

Standard tests in JUnit 3

In JUnit 3, in order to create test cases, we need to extend the class junit.framework.TestCase. This base class includes the framework code that JUnit needs to automatically run the tests. Then, we simply make sure that the method name follows the testXXX() pattern. This naming convention makes it clear to the framework that the method is a unit test and that it can be run automatically.

The test life cycle is controlled in the setup() and tearDown()methods. The TestCase calls setup() before running each of its tests and then calls teardown() when each test is complete. One reason to put more than one test method into the same test case is to share the same test fixture.

Finally, in order to implement the verification stage in the test case, JUnit 3 defines several assert methods in a utility class named junit.framework.Assert. The following table summarizes the main assertions provided by this class:

Method Description
assertTrue Asserts that a condition is true. If it isn’t, the method throws an AssertionFailedError with the given message (if any).
assertFalse Asserts that a condition is false. If it isn’t, the method throws an AssertionFailedError with the given message (if any).
assertEquals Asserts that two objects are equal. If they are not, the method throws an AssertionFailedError with the given message (if any).
assertNotNull Asserts that an object is not null. If it is, the method throws an AssertionFailedError with the message (if any).
assertNull Asserts that an object is null. If it isn’t, the method throws an AssertionFailedError with the given message (if any).
assertSame Asserts that two objects refer to the same object. If they do not, the method throws an AssertionFailedError with the given message (if any).
assertNotSame Asserts that two objects do not refer to the same object. If they do, the method throws an AssertionFailedError with the given message (if any).
fail Fails a test (throwing AssertionFailedError) with the given message (if any).

The following class shows a simple test implemented with JUnit 3.8.2. As we can see, this test case contains two tests. Before each test, the method setUp() will be invoked by the framework, and after the execution of each test, the method tearDown() will be also invoked. This example has been coded so that the first test, named testSuccess() finishes correctly, and the second test named testFailure() ends with an error (the assertion throws an exception):

package io.github.bonigarcia;

import junit.framework.TestCase;

public class TestSimple extends TestCase {

// Phase 1: Setup (for each test)
protected void setUp() throws Exception {

// Test 1: This test is going to succeed
public void testSuccess() {
// Phase 2: Simulation of exercise
int expected = 60;
int real = 60;
System.out.println("** Test 1 **");

// Phase 3: Verify
assertEquals(expected + " should be equals to "
+ real, expected, real);

// Test 2: This test is going to fail
public void testFailure() {
// Phase 2: Simulation of exercise
int expected = 60;
int real = 20;
System.out.println("** Test 2 **");

// Phase 3: Verify
assertEquals(expected + " should be equals to "
+ real, expected, real);

// Phase 4: Teardown (for each test)
protected void tearDown() throws Exception {

All the code examples explained in this book are available on the GitHub repository

Test execution in JUnit 3

JUnit 3 allows to run test cases by means of Java applications called test runners. JUnit 3.8.2 provides three different test runners out of the box: two graphical (Swing and AWT based) and one textual that can be used from the command line. The JUnit framework provides separate class loaders for each test, in order to avoid side effects among tests.

It is a common practice that build tools (such as Ant or Maven) and Integrated Development Environments -IDE- (such as Eclipse and IntelliJ) implement its own JUnit test runner.

The following image shows what the previous test looks like when we use the JUnit Swing runner, and also when we use Eclipse to run the same test case.

Execution of an JUnit 3 test case using the graphical Swing test runner and also with the Eclipse test runner

When a test is not succeeded in JUnit, it can be for two reasons: a failure or an error. On the one hand, a failure is caused by an assertion (Assert class) which is not meet. On the other hand, an error is an unexpected condition not expected by the test, such as a conventional exception in the software under test.

Another important contribution of JUnit 3 is the concept of the test suite, which is a convenient way to group tests that are related. Test suites are implemented by means of the JUnit class junit.framework.TestSuite. This class, in the same way as TestCase, implements the framework interface junit.framework.Test.

A diagram containing the main classes and methods of JUnit 3 is depicted as follows:

Core JUnit 3 classes

The following snippet shows an example of the use of test suites in JUnit 3. In short, we can create a group of tests simply instantiating a TestSuite object, and then add single test cases using the method addTestSuite():

package io.github.bonigarcia;

import junit.framework.Test;
import junit.framework.TestSuite;

public class TestAll {

public static Test suite() {
TestSuite suite = new TestSuite("All tests");
return suite;

This test suite can be later executed using a test runner. For example, we could use the command-line test runner (junit.textui.TestRunner) and the command line, as follows:

Test suite executed using the textual test runner and the command line

JUnit 4

JUnit 4 is still an open source framework, though the license changed with respect to JUnit 3, from CPL to Eclipse Public License (EPL) Version 1.0. The source code of JUnit 4 is hosted on GitHub (

On February 18, 2006, JUnit 4.0 was released. It follows the same high-level guidelines than JUnit 3, that is, easily define test, the framework run tests independently, and the framework detects and report errors by the test.

One of the main differences of JUnit 4 with respect to JUnit 3 is the way that JUnit 4 allows to define tests. In JUnit 4, Java annotations are used to mark methods as tests. For this reason, JUnit 4 can only be used for Java 5 or later. As the documentation of JUnit 4.0 stated back in 2006:

The architecture of JUnit 4.0 is a substantial departure from that of earlier releases. Instead of tagging test classes by subclassing junit.framework.TestCase and tagging test methods by starting their name with 'test', you now tag test methods with the @Test annotation.

Standard tests in JUnit 4

In JUnit 4, the @Test annotation (contained in package org.junit) represents a test. Any public method can be annotated with @Test to make it a test method.

In order to set up the test fixture, JUnit 4 provides the @Before annotation. This annotation can be used in any public method. Similarly, any public method annotated with @After gets executed after each test method execution. JUnit 4 provides two more annotations to enhance the test life cycle: @BeforeClass and @AfterClass. They are executed only once per test class, before and after all tests, respectively. The following picture depicts the life cycle of a JUnit 4 test case:

JUnit 4 test life cycle
@Before and @After can be applied to any public void methods. @AfterClass and @BeforeClass can be applied to only public static void methods.

The following table summarizes the main differences between JUnit 3 and JUnit 4 seen so far:

Feature JUnit 3 JUnit 4
Test definition testXXX pattern @Test annotation
Run before the first test Not supported @BeforeClass annotation
Run after all the tests Not supported @AfterClass annotation
Run before each test Override setUp() method @Before annotation
Run after each test Override tearDown() method @After annotation
Ignore tests Not supported @Ignore annotation

The org.junit.Assert class provides static methods to carry out assertions (predicates). The following are the most useful assertion methods:

  • assertTrue: If the condition becomes false, the assertion fails and AssertionError is thrown.
  • assertFalse: If the condition becomes true, the assertion fails and AssertionError is thrown.
  • assertNull: This checks whether the argument is null, otherwise throws AssertionError if the argument is not null.
  • assertNotNull: This checks whether the argument is not null; otherwise, it throws AssertionError
  • assertEquals: This compares two objects or primitive types. Moreover, if the actual value doesn't match the expected value, AssertionError is thrown.
  • assertSame: This supports only objects and checks the object reference using the == operator.
  • assertNotSame: This is the opposite of assertSame.

The following snippets provide a simple example of a JUnit 4 test case. As we can see, it is the equivalent test case as seen in the previous section, this time using the JUnit 4 programming model, that is, using @Test annotation to identify tests and other annotations (@AfterAll, @After, @BeforeAll, @Before) to implement the test life cycle (setup and teardown test fixture):

package io.github.bonigarcia;

import static org.junit.Assert.assertEquals;

import org.junit.After;
import org.junit.AfterClass;
import org.junit.Before;
import org.junit.BeforeClass;
import org.junit.Test;

public class TestSimple {

// Phase 1.1: Setup (for all tests)
public static void setupAll() {
System.out.println("<Setup Class>");

// Phase 1.2: Setup (for each test)
public void setupTest() {
System.out.println("<Setup Test>");

// Test 1: This test is going to succeed
public void testSuccess() {
// Phase 2: Simulation of exercise
int expected = 60;
int real = 60;
System.out.println("** Test 1 **");

// Phase 3: Verify
assertEquals(expected + " should be equals to "
+ real, expected, real);

// Test 2: This test is going to fail
public void testFailure() {
// Phase 2: Simulation of exercise
int expected = 60;
int real = 20;
System.out.println("** Test 2 **");

// Phase 3: Verify
assertEquals(expected + " should be equals to "
+ real, expected, real);

// Phase 4.1: Teardown (for each test)
public void teardownTest() {
System.out.println("</Ending Test>");

// Phase 4.2: Teardown (for all test)
public static void teardownClass() {
System.out.println("</Ending Class>");


Test execution in JUnit 4

The concept of the test runner is also present in JUnit 4, but it was slightly improved with respect to JUnit 3. In JUnit 4, a test runner is a Java class used to manage a test’s life cycle: instantiation, calling setup and teardown methods, running the test, handling exceptions, sending notifications, and so on. The default JUnit 4 test runner is called BlockJUnit4ClassRunner, and it implements the JUnit 4 standard test case class model.

The test runner to be used in a JUnit 4 test case can be changed simply using the annotation @RunWith. JUnit 4 provides a collection of built-in test runners that allows to change the nature of the test class. In this section, we are going to review the most important ones.

  • To run a group of tests (that is, a test suite) JUnit 4 provides the Suite runner. In addition to the runner, the class Suite.SuiteClasses allows to define the individual test classes belonging to the suite. For example:
     package io.github.bonigarcia;

import org.junit.runner.RunWith;
import org.junit.runners.Suite;

@Suite.SuiteClasses({ TestMinimal1.class, TestMinimal2.class })
public class MySuite {
  • Parameterized tests are used to specify different input data that is going to be used in the same test logic. To implement this kind of tests, JUnit 4 provides the Parameterized runner. To define the data parameters in this type of test, we need to annotate a static method of the class with the annotation @Parameters. This method should return a Collection of the two-dimensional array providing input parameters for the test. Now, there will be two options to inject the input data into the test:
    1. Using the constructor class.
    2. Annotating class attributes with the annotation @Parameter.

The following snippets show an example of the latter:

package io.github.bonigarcia;

import static org.junit.Assert.assertTrue;

import java.util.Arrays;
import java.util.Collection;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.Parameterized;
import org.junit.runners.Parameterized.Parameter;
import org.junit.runners.Parameterized.Parameters;

public class TestParameterized {

public int input1;

public int input2;

public int sum;

@Parameters(name = "{index}: input1={0} input2={1} sum={2}?")
public static Collection<Object[]> data() {
return Arrays.asList(
new Object[][] { { 1, 1, 2 }, { 2, 2, 4 }, { 3, 3, 9 } });

public void testSum() {
assertTrue(input1 + "+" + input2 + " is not " + sum,
input1 + input2 == sum);


The execution of this test on Eclipse would be as follows:

Execution of a Parameterized test in Eclipse
  • JUnit theories are an alternative to JUnit's parameterized tests. A JUnit theory is expected to be true for all datasets. Thus, in JUnit theories, we have a method providing data points (that is, the input values to be used for the test). Then, we need to specific a method annotated with @Theory which takes parameters. The theories in a class get executed with every possible combination of data points:
     package io.github.bonigarcia;

import static org.junit.Assert.assertTrue;

import org.junit.experimental.theories.DataPoints;
import org.junit.experimental.theories.Theories;
import org.junit.experimental.theories.Theory;
import org.junit.runner.RunWith;

public class MyTheoryTest {

public static int[] positiveIntegers() {
return new int[] { 1, 10, 100 };

public void testSum(int a, int b) {
System.out.println("Checking " + a + "+" + b);
assertTrue(a + b > a);
assertTrue(a + b > b);

Take a look at the execution of this example, again in Eclipse:

Execution of a JUnit 4 theory in Eclipse

Advanced features of JUnit 4

One of the most significant innovations introduced in JUnit 4 was the use of rules. Rules allow flexible addition or redefinition of the behavior of each test method in a test class. A rule should be included in a test case by annotating a class attribute with the annotation @Rule. The type of this attribute should inherit the JUnit interface org.junit.rulesTestRule. The following rules are provided out of the box in JUnit 4:

  • ErrorCollector: This rule allows execution of a test to continue after the first problem is found
  • ExpectedException: This rule allows to verify that a test throws a specific exception
  • ExternalResource: This rule provides a base class for Rules that set up an external resource before a test (a file, socket, server, database connection, and so on) and guarantee to tear it down afterward
  • TestName: This rule makes the current test name available inside test methods
  • TemporaryFolder: This rule allows creation of files and folders that should be deleted when the test method finishes
  • Timeout: This rule applies the same timeout to all test methods in a class
  • TestWatcher: It is a base class for rules that will keep a log of each passing and failing test

Another advance JUnit 4 features allow to:

  • Execute tests is a given order, using the annotation @FixMethodOrder.
  • Create assumptions using the class Assume. This class offers many static methods, such as assumeTrue(condition), assumeFalse(condition), assumeNotNull(condition), and assumeThat(condition). Before executing a test, JUnit checks the assumptions present in the test. If one of the assumptions fail, the JUnit runner ignores the tests with failing assumptions.
  • JUnit provides a timeout value (in milliseconds) in the @Test annotation to make sure that if a test runs longer than the specified value, the test fails.
  • Categorize tests using the test runner Categories and identify the types of test annotating the tests method with the annotation Category.
Meaningful examples for each of one of the earlier mentioned features can be found in the GitHub repository (

JUnit ecosystem

JUnit is one of the most popular test frameworks for the JVM, and it is considered one of the most influential frameworks in software engineering. We can find several libraries and frameworks that provide additional functionality on top of JUnit. Some examples of these ecosystem enhancers are:

  • Mockito ( This is the mock framework, which can be used in conjunction with JUnit.
  • AssertJ ( This is the fluent assertions library for Java.
  • Hamcrest ( This is the library with matchers that can be combined to create flexible and readable assertions.
  • Cucumber ( This is the testing framework that allows to run automated acceptance tests written in a Behavior-Driven Development (BDD) style.
  • FitNesse ( This is the testing framework designed to support acceptance testing by facilitating detailed readable descriptions of system functions.

While JUnit is the largest testing framework for the JVM, it is not the only one. There are several other testing frameworks available for the JVM. Some examples are:

Thanks to JUnit, testing has moved to a central part of programming. Consequently, the underlying testing model implemented in JUnit, has been ported to a set of testing frameworks outside the boundary of the JVM, in the so-called xUnit family. In this model, we find the concepts of test case, runner, fixture, suite, test execution, report, and assertion. To name a few, consider the following frameworks. All of them fall into the xUnit family:



Software quality is a key concept in software engineering, since it determines the degree in which a software system meets its requirements and user expectations. Verification and Validation is the name given to set of activities aimed to assess a software system. The goal of V&V is to ensure the quality of a piece of software while reducing the number of defects. The two core activities in V&V are software testing (evaluation of a running piece of software) and static analysis (assessment of software artefacts without its execution).

Automated software testing has experienced biggest advances in the last few decades. In this arena, the JUnit framework has a remarkable position. JUnit was designed to be a unit framework for the JVM. Nowadays, it is a fact that JUnit is the most popular test frameworks in the Java community, providing a comprehensive programming model to create and execute test cases. In the next section, we will discover the features and capabilities provided by the new version of the framework, JUnit 5.

About the Author
  • Boni García

    Boni García has a PhD degree on Information and Communications Technology from Technical University of Madrid (UPM) in Spain since 2011. Currently he works as a Researcher at King Juan Carlos University (URJC) and Assistant Professor at Digital Art and Technology University (U-tad) in Spain. He is member of Kurento project, where he is in charge of the testing framework for WebRTC applications. He participates in the coordination of the ElasTest project, an elastic platform aimed to ease end-to-end testing. Boni is an active member on the free open source software (FOSS) community with big emphasis on software testing and web engineering. Among other, he owns the open source projects WebDriverManager and selenium-jupiter (JUnit 5 extension for Selenium).

    Browse publications by this author
Latest Reviews (3 reviews total)
I do not know. Haven't recieved thee book yet (up to 3 weeks of delivery).
Great book! Very useful...
Bought this book to get up-to-date on the latest Junit 5 release, it just does that. It is also usable for somebody new to Junit and is a great resource all together.
Mastering Software Testing with JUnit 5
Unlock this book and the full library FREE for 7 days
Start now