Towards Efficient Data-flow Test Data Generation

Overview

Data-flow testing (DFT) aims to detect potential data interaction anomalies by focusing on the points at which variables receive values and the points at which these values are used. Such test objectives are referred as def-use pairs. However, the complexity of DFT still overwhelms the testers in practice. To tackle this problem, we introduce a hybrid testing framework for data-flow based test generation: (1) The core of our framework is symbolic execution (SE), enhanced by a novel guided path exploration strategy to improve testing performance; and (2) we systematically cast DFT as reachability checking in software model checking (SMC) to complement SE, yielding practical DFT that combines the two techniques’ strengths. We implemented our framework for C programs on top of the state-of-the-art symbolic execution engine KLEE and instantiated with three different software model checkers. Our evaluation on the 28,354 def-use pairs collected from 33 open-source and industrial program subjects shows (1) our SE-based approach can improve DFT performance by 15∼48% in terms of testing time, compared with state-of-the-art search strategies; and (2) our combined approach can further reduce testing time by 20.1∼93.6%, and improve data-flow coverage by 27.8∼45.2% by eliminating infeasible test objectives. Compared with the SMC-based approach alone, our combined approach can also reduce testing time by 19.9%∼23.8%, and improve data-flow coverage by 7∼10%. This combined approach also enables the cross-checking of each component for reliable and robust testing results. We have made our testing framework and benchmarks publicly available to facilitate future research.

Approach Workflow

Tools (Download)

Our approach was built on four different tools, which are summarized as follows.

Our version of KLEE can be downloaded, which is adapted for data-flow testing.

Name Approach Technique Version
KLEE Symbolic Execution SE v1.1.0
BLAST Model Checking CEGAR v2.7.3
CPAchecker Model Checking CEGAR v1.7
CBMC Model Checking BMC v5.7

Benchmark (Download)

We carefully constructed a repository of benchmark subjects for evaluating data-flow testing techniques. It includes subjects from four sources:

CBMC uses --unwind and --depth, respectively, to limit the number of times loops to be unwound and the number of program steps to be processed. The parameters used in our evaluation are listed below.

Name CBMC's --unwind CBMC's --depth
factorization 4 90
power 2 40
find 4 90
triangle 2 40
strmat 3 60
strmat2 3 70
textfmt 25 700
tcas 2 200
replace 6 350
totinfo 15 250
printtokens 30 300
printtokens2 12 450
schedule 5 350
schedule2 5 250
kbfiltr 2 250
kbfiltr2 2 300
diskperf 2 650
floppy 2 500
floppy2 2 500
cdaudio 2 500
s3_clnt 13 500
s3_clnt_termination 13 400
s3_srvr_1a 10 200
s3_srvr_1b 4 100
s3_srvr_2 15 1450
s3_srvr_7 15 1450
s3_srvr_8 15 1450
s3_srvr_10 15 600
s3_srvr_12 15 1500
s3_srvr_13 15 1500
osek_control 15 1500
space_control 15 1500
subway_control 15 1500

Summary of Results

For more information, please feel free to contact us: Ting Su, Chengyu Zhang, Yichen Yan.

Reference

Last modified: 2019.3