Software Engineering Chapter8: Testing the Programs

Posted on Thu 22 June 2023 in 软件工程

Chapter8: Testing the Programs

mind map

Part1: Software Faults and Failures

We say the software has failed when it does not do what the requirements describe.

Why software faults and failures exist?

  • Software systems deal with large numbers of states, complex formulas, activities, and algorithms.

  • We implement the system in our own understanding, but sometimes even the customer is uncertain of exactly what is needed.

  • The size of a project and the number of people involved can add complexity.

What were the failures caused by?

  • The specification may be wrong or have a missing requirement.
  • The specification may contain a requirement that is impossible to implement.
  • The system design may contain a fault.
  • The program design may contain a fault.
  • The program code may be wrong.

We test a program to demonstrate the existence of a fault.

We say a test is successful when:

  • a fault is discovered
  • a failure occurs as a result of the testing procedures.

Fault identification: the process of determining what fault caused the failure

Fault correction (removal): the process of making changes to the system so that the faults are removed

Types of faults

  • Algorithmic Fault

    This kind of fault is sometimes easy to spot just by reading through the program (called desk checking) or by submitting different classes of input data.

    Some typical algorithmic faults: forget to initialize variables, forget to set loop invariants, compare variables of inappropriate data types.

  • Syntax Fault

  • Computation and Precision Fault

  • Documentation Fault

  • Stress or Overload Fault

    Data structures are filled past their specified capacity

  • Capacity or Boundary Fault

    The system's performance becomes unacceptable when system activity reaches its specified limit.

  • Timing or Coordination Fault (real-time system)

  • Throughput or Performance Fault

  • Recovery Fault

  • Hardware and System Software Fault

  • Standards and Procedure Fault

Orthogonal Defect Classification

The classification scheme must be product- and organization-independent, and be applied to all stages of development.

Fault of omission: some key aspect of the code is missing

Fault of commission: just incorrect

The classification is orthogonal when any item being classified belongs to exactly one category.

The fault types in IBM orthogonal defect classification: function, interface, checking, assignment, timing/serialization, build/package/merge, documentation, algorithm

HP selects three descriptor for each fault:

  • origin of fault (where)
  • type of fault (what)
  • mode of fault (why) (such as missing, wrong, changed)

The fault classifications help improve the entire development process by telling us which types of faults are found in which development activities.

Part2: Testing Issues

In egoless programming, programs are viewed as components of a larger system. When a fault is discovered or a failure occurs, the egoless development team is concerned with correcting the fault, not blaming a particular developer.

Using an independent test team to test the system has lots of benefits:

  • avoid conflict between individual and the faults discovering
  • keep objective
  • testing can proceed concurrently with coding

Test Organization (Types of Test)

  • module/component/unit testing

    test each program component on its own

  • integration testing

    test if the interfaces among the components are defined and handled properly

    verify the system components work together as described in the system and program design specifications

  • function testing

    test if the system has the desired functionality

    evaluate the system that if the functions described by the requirements specification are actually performed by the integrated system.

  • performance testing

    compare the system with the software and hardware requirements.

The requirements were documented in two ways - customer's terminology, which has been tested by function testing - software and hardware requirements, which have been tested by performance > testing.

So, after passing the performance testing, we can say that the system > is a validated system

  • acceptance testing

    the system is checked against the customer's requirements description

  • installation testing

    make sure the system still functions in the environment in which the system will be used.

Except for the unit testing and the integration testing, the rest of the testing can be collectively called system testing.

testing steps

Views of Test Objects

The choice of test views depends on many factors:

  • the number of possible logical paths
  • the nature of the input data
  • the amount of composition involved
  • the complexity of the algorithms

Closed/Black Box

We view the test object as a black box whose contents are unknown. The testing method is to feed input to it, note what output is produced, and check if the output matches the expected one.

Advantage: free of constraints

Disadvantage: impossible to run a compete test

Clear/White Box

We can use the structure of the test object to test in different ways. (devise test cases that execute all the statements or all the control paths within the component)

However, it may be impractical to take this kind of approach because of the enormous amount of branches and loops.

Part3: Unit Testing

  • Reviews and inspections are the most effective for discovering design and code problems.
  • Prototyping is best at identifying problems with requirements.

The subtitle listed following shows the steps of unit testing:

Examine the Code (code review)

Code review is the process of asking an objective group of experts to review both the code and its documentation.

The code review team doesn't contain the people from the customer's organization.

Code inspections identified mostly coding or low-level design faults, but testing discovered mostly requirements and architectural faults.

Code Walkthroughs

The programmer presents the code and the accompanying documentation to the review team and leads the discussion.

The goal of code walkthrough is finding faults, but not necessarily fixing them.

Code Inspections

The review team checks the code and documentation against a prepared list of concerns.

Inspection team members are chosen based on the inspection's goals.

Steps of inspecting code:

  1. Group meeting for an overview of the code and a description of the inspection goals.
  2. Each inspector studies the code and its related documents, noting the faults found.
  3. In the second group meeting, team members report what they have found, recording additional faults discovered in the process of discussing individuals' findings.

Proving Code Correct

In most cases, it is impossible to prove a program being correct.

Development teams are more likely to be concerned with testing their software rather than with proving their programs correct.

Testing Program Components

To test a component, we choose input data and conditions, allow the component to manipulate the data, and observe the output.

If the system would remember the conditions from previous cases, we need sequences of test cases for the testing.

Test point (test case): a particular choice of input data to be used in testing a program.

Test: a finite collection of test cases.

We firstly determine the test objectives, and then select test cases and define a test designed to meet a specific objective.

We can separate the input into Equivalence classes:

  • Every possible input belongs to one of the classes. (the classes cover the entire set of input data)
  • No input data belongs to more than one class
  • Any element of the class represents all elements of that class

We can combine open- and closed-box testing to generate test data:

  • view the program as a closed-box: use the program's external specifications to generate initial test cases
  • view the program as open-box: use the program's internal structure to add other cases

Test Thoroughness

Several test strategies:

  • statement testing

    Every statement is executed at least once.

  • branch testing

    Every decision point is chosen at least once.

  • path testing (strongest test strategy)

    Every distinct path is executed at least once.

  • definition-use path testing

    For every variable, every path from the definition to every use point is executed.

  • all-uses testing

    The test set includes at least one path from every definition to every use that can be reached by that definition.

  • all-predicate-uses/some-computational-uses testing (APU+C)

  • all-computational-uses/some-predicate-uses testing (ACU+P)

relative strengthen of test strategies

The figure above shows the test strategies from strongest to weakest.

Generally, the stronger the strategy, the more test cases are used.

Example

flow chart

Statement testing: 1-2-3-4-5-6-7

Branch testing: 1-2-4-5-6-1(NO, NO)、1-2-3-4-5-6-7(YES, YES)

Path testing: 1-2-4-5-6-1(NO, NO)、1-2-4-5-6-7(NO, YES)、1-2-3-4-5-6-1(YES, NO)、1-2-3-4-5-6-7(YES, YES)

Part4: Integration Testing

  • Each component is merged only once for testing.
  • Stubs and drivers (defined later) are separate, new programs.

Bottom-Up Integration

Bottom-up integration is a process of merging components to test the larger system.

Each component at the lowest level is tested individually first, then the next components that call the previously tested ones are tested.

A component driver is a routine that calls a particular component and passes a test case to it.

Integration testing example

In bottom-up testing, the top-level components are usually the most important but the last to be tested. So, some faults in the top levels, which reflect faults in design, are discovered too late.

But for object-oriented programs, the bottom-up testing is often the most sensible.

Top-Down Integration

In top-down integration, the top level, usually a controlling component, is tested by itself, and then all components called by the tested component are combined and tested as a larger unit.

Stub is a special-purpose program to simulate the activity of the missing component. It passes back output data that lets the testing process continue.

Sometimes, writing stubs can be difficult, not only because of their complexity but also because a large number of them are required.

In the modified top-down testing, we test each level's components individually before the merger takes place. (both stubs and drivers are needed)

top-down integration example

Big-Bang Integration

All components are tested in isolation, and then they are mixed together as the final system to see if it works the first time.

This method may work on small systems, but not on large ones. Actually it is not recommended for any system.

The reasons why not use big-bang integration

  • require both stubs and drivers
  • all components are merged at once so that it's difficult to find the cause of any failure
  • interface faults cannot be distinguished from others

big-bang integration example

Sandwich Integration

A combination of bottom-up and top-down integration.

The system is viewed as three layers:

  • target layer in the middle
  • top levels above the target layer: top-down integration
  • lower layer below the target layer: bottom-up integration

In modified sandwich testing, target level components are allowed to be tested individually before integration.

sandwich integration example

Part5: Testing Object-Oriented Systems

Many of the techniques for testing systems can also be used in object-oriented systems.

The testing on object-oriented systems should address many different levels, including functions, classes, clusters, and the whole system.

The traditional testing approaches are designed for functions and do not take into account the object states that will be used in testing classes. So, we should develop tests to track the object's state and the changes to it.

When we add a new subclass or modify an existing subclass, the methods inherited from the superclasses need to be retested.

When a subclass overrides the method, the subclass needs to be retested with probably a different set of test data.

In object-oriented systems, unit testing is less difficult, but the integration testing must be much more extensive because of encapsulation.

The figure below shows the easier and harder parts of testing object-oriented systems.

easier and harder parts of testing object-oriented systems

Part6: Test Planning

The test process can proceed in parallel with many of the other development activities.

Steps of testing process:

  • establishing test objectives

    The test objective tells us what kinds of test cases to generate.

  • designing test cases

    key to successful testing

  • writing test cases

  • testing test cases

    Review the test cases to verify their correctness, feasibility, providing the desired degree of coverage, and demonstrating the desired functionality.

  • executing tests

  • evaluating test results

A test plan is used to organize testing activities and address unit testing, integration testing, and system testing.

The test plan describes the way we will show our customers that the software works correctly.

The plan explains the entire testing activity from the perspectives of who, why, how, and when.

Actually the system test plan is a series of test plans, and each plan in this series is for one kind of test.

What contents are included in the test plan?

  • the test objectives that address each type of testing
  • how the tests will be run
  • what criteria will be used to determine when the testing is complete
  • the methods to be used in each test
  • description of any automated support
  • how test data will be generated
  • how any output data or state information will be captured

As a result, test plan describes a complete picture of how and why testing will be performed.

Part7: Automated Testing Tools

Code Analysis Tools

Static Analysis Tools

This kind of tool can analyze the source code of programs before they are run.

  • code analyzer

    Highlight the syntax error

  • structure checker

    use a graph to depict the logic flow, and check for structural flaws

  • data analyzer

    note improper linkage among components, conflicting data definitions, and illegal data usage

  • sequence checker

    check if the events are coded in the proper sequence

Measurements and structural characteristics are included in the output from many static analysis tools.

Dynamic Analysis Tools

Automated tools capture the state of events during the execution by preserving a snapshot of conditions.

These tools are called program monitors. The program monitor can watch and report the program's behavior, provide statistics about the statement or path coverage of their test cases.

Test Execution Tools

Automated test execution tools are essential for handling the very large number of test cases that must be run to test a system thoroughly.

Some functionalities of test execution tools:

  • Capture and replay (capture and playback)

    Capture the key input and responses as tests are being run and report the difference between the actual outcome and the expected one to the team.

  • Generate stubs and drivers automatically

    The generated drivers usually have more functionalities than we manually created.

  • Automated testing environments

    Automated testing environments integrated with a suite of automated testing tools are provided.

Testing will always involve the manual effort to trace a problem back to its root cause.

Test Case Generators

Test generators base their test cases on the structure of the source code, the data flow, the functional testing, the state of each variable in the input domain, or just randomness.

Structural test case generators are base on the structure of the source code.

Part8: When to Stop Testing

probability of finding faults

As we can see in the figure above, the more faults we find currently, the more faults we will probably find next.

Following are some indexes we can use to evaluate the progress of testing:

Fault Seeding (Error Seeding)

This technique can be used to estimate the number of faults in a program.

This technique is based on a known number of faults that have been inserted in a program on purpose.

$$ \frac{detected\ seeded\ faults}{total\ seeded\ faults}=\frac{detected\ nonseeded\ faults}{total\ nonseeded\ faults} $$

The formula above assumes that the seeded faults are of the same kind and complexity as the actual faults in the program.

To eliminate the assumption, we improve this method as below.

We use two independent test groups to test the same program.

Let \(x\) and \(y\), respectively, be the number of faults detected by Group 1 and Group 2. And suppose \(n\) to be the total number of all faults in the program. Finally, let's say \(q\) is the number of faults found by both groups.

The effectiveness of Group 1 and Group 2 is written as \(E_1\) and \(E_2\)

Then we get:

$$ E_1=x/n\\ E_2=y/n $$

Let's assume that:

$$ E_1=x/n=q/y\\ E_2=y/n=q/x $$

Finally, we can estimate \(n\):

$$ n=\frac{q}{E_1\cdot E_2} $$

Confidence in the Software

The software confidence represents the likelihood that the software is fault-free.

Suppose we have seeded a program with \(S\) faults, and we claim that the code has only \(N\) actual faults. And let's say that at the time we found all \(S\) seeded faults, we found \(n\) actual faults.

The confidence level of the program can be calculated as

$$ C= \begin{cases} 1&n\gt N\\ \frac{S}{S-N+1}&n\le N \end{cases} $$

Other Stopping Criteria

The test strategy itself can be used to set stopping criteria for testing.

When we are doing statement, path, or branch testing, we can track how many of them have been executed so that we can determine our test progress.

Identifying Fault-Prone Code

Many techniques based on the past history of faults in similar applications can be used to help identify the fault-prone code.

We can use classification trees to identify fault-prone components. The decision tree shows which measurements are the best predictor of a particular attribute.

classification tree example

As the figure above shows, if a program has more than 300 lines of code, has no design review, and has had its code changed five times, it is probably a fault-prone component.