JOT Tests

This document describes the pre-release tests used to verify a new version of jot. These are regression tests - they cannot prove that a version of jot is correct, they just test for regressions in specific areas.

Typically, a new version of jot will be assigned the name jot_dev (or jot_dev.exe for a windows version).

test.jot

$ ./jot_dev ${JOT_RESOURCES}/t.t -in=%r=test  

This runs basic regression tests on all jot commands and command variants. It is totally autonomous and, on completion, should display the message

test_visual.jot

$ ./jot_dev t.t -in=%r=test_visual[ -debug][ -nosize][ -nounicode]

[ -win10relaxation][ -fromtest <n>][ -break <n>][ -cc[ <n>]][ -exit ]

monkey_test.jot

$ jot ${JOT_RESOURCES}/t.t -st -in="<argsList>"

Valid args:

This constructs test scripts containing thousands of randomly-generated valid but meaningless commands. The idea is to detect any crasheyness or memory-management misbehaviour using a variety of approaches. As one script completes successfully it constructs a new one using a different random sequence of commands.

One might opt for -tests=100000 when using gdb and maybe tests=10000 with valgrind, in order to limit total run time to a few minutes. If a failure is detected in a such a big script it is necessary to reduce the size of the script to identify the cause of the failure.To do that use the error_search.jot script - this is a binary search procedure which cuts out sections of the test script until we're left with, hopefully, just a handfull of lines that can be worked on easily.

The essential qualifiers are the selectin of run harness -valgrind (detects memory-management misbehaviour) or gdb, which is used to trap and report crashes. Other harness options are xterm, which runs the script in a separate xterm or winedbg which runs in a wine debugger console and -subprocess.

At the start of day, it resets the pseudorandom generator using the current date-time stamp, the value of this seed is reflected in the name of the generated script. To regenerate a script it is possible to force it to take a predefined seed with the -seed=N modifier.

The monkey_test-generated scripts can be very large - shorter scripts are less effective since they are less likely to contain combinations of commands that might trigger some hidden data sensitivity. To identify the commands actively provoking the crash there is a script error_search.jot that does a binary search of the script, typically boiling it down to 10 lines or less. However searching very large scripts is impracticable due to the enormous search times - about 10000 tests per script is a pretty good compromise.

error_search.jot

$ jot ${JOT_RESOURCES}/l99.t -in="%r=error_search 

When a script generated by monkey_test.jot fails - either it's crashed or, when used with valgrind, it's done something naughty but not immediately fatal, revealing a data sensitive bug. By analysing the generated script we might be able to spot what's going on, maybe with the aid of gdb or a similar debugger.

The -script=<scriptName>... qualifier can be used to specify any number of scripts to be processed. Alternatively, when working with monkey_test.jot-generated scripts, it can be passed the monkey-test logfile and it will pick up the failing scripts from there.

By default, the test sessions are launched as a simple subprocess, the -xterm qualifier launches it in an xterm so you can keep an eye on what's happening, the -gdb qualifier launches the test monitored by gdb session and -valgrind monitors with valgrind, the -timeout qualifier is used to identify lines which cause the script to hang.

The timeout qualifier runs the script with the specified timeout (defaults to 600), denominated in seconds, using the unix timeout command.

The -history[=n] qualifier preserves the last n versions of the boiled script by suffixing the script name with _1, _2 ... _n.

The script is first run in it's entierity to verify that the failure is detectable by error_search.jot. It then conducts a binary search for the test-script lines causing the failure. At each stage it removes a block of lines from the script and re runs the remainder. If it no longer fails then there was something in the removed section that enables the failure and the removed block of lines is restored. It then makes it's way through the script, on completion of each pass of the script the block size is halved and the search resumes at the beginning of the script. The search terminates after it completes a full pass with a block size of 1 line.

The error_search script detects that it misbehaved by searching the reply from the child session. You can specify a different failure criterion by specifing your own comparator commands in the -failif modifier.

Often it's not that obvious what's going on and it would be useful to know exactly which commands are trigger the error. Now, it's trivial to find which was the last command before it crashed but, several earlier commands were involved in creating the conditions for the crash. Normally, it's less than a dozen or so commands out of the many thousands of randomly-chosen commands. The error_search.jot script performs a binary search, slicing down the generated script and repeating with progressively smaller slices, until it finds the minimum required to provoke the error.

Demonstration of error_search

Typical usage:

$ ./jot t.t -in="%r=error_search -valgrind -script=./test108110917.jot"

The error_search.jot script runs your test script (in this case test108110917.jot) and, if the full script results in valgrind error reports, the script is whittled down to the minimum set of commands that still results in valgrind errors.

test.sh and test.bat

$ test.sh[ -test n][ -exe <executable>]
$ test.bat[ -test n][ -exe <executable>][ -nosetup][ -hold][ -nobble][ -clearout]

These scripts have been superseded by test_all.jot

These two scripts are similar, they run identical tests. test.sh runs tests on the linux version in a bash shell. test.bat runs the same tests on the windows version in a windows (or wine) console. Unfortunately, the effort required to keep test.bat aligned with test.sh proved too be much.

Some simple tests for the linux version in simulated real-life situations.

test_all.jot

$ jot -in="%r=test_all[ -exe <executable>][ -nosize][ -test <n>[ <n>[ ...]]][ -debug 1]