This document describes the pre-release tests used to verify a new version of jot. These are regression tests - they cannot prove that a version of jot is correct, they just test for regressions in specific areas.
Typically, a new version of jot will be assigned the name jot_dev (or jot_dev.exe for a windows version).
$ ./jot_dev ${JOT_RESOURCES}/t.t -in=%r=test
This runs basic regression tests on all jot commands and command variants. It is totally autonomous and, on completion, should display the message
"Successfully completed all tests"
$ ./jot_dev t.t -in=%r=test_visual[ -debug][ -nosize][ -nounicode]
[ -win10relaxation][ -fromtest <n>][ -break <n>][ -cc[ <n>]][ -exit ]
This runs a series of tests designed to detect any regression in the way it drives the screen. In olden times this was literally a visual test - the intrepid tester had to sit and check each several dozen screenshots to make sure nothing was going wrong. These days, though, it's all done automatically using the query screen command.
Each of these test (currently there are 127 tests) perfoms some manipulation of the text and checks that the screen has been correctly updated. It uses the Query screen report which, in turn relies on the OS functions getcchar() in linux (actually part of the ncurses system) and ReadConsoleOutputCharacter() in winndows). While the linux version is totally reliable the windows version is decidedly flakey and, strangely, windows 10 seems worse than older versions.
The -debug qualifier inserts a breakpoint trap immediately after reading back the window status, it also redefines the window configuration to make it easier to see what's going on..
Some tests require an 80*40 terminal, others are less demanding, the -nosize skips the terminal-size check.
Some linux terminals do not support unicode, if this is true of the current terminal the -nounicode qualifier will skip these tests.
The -win10relaxation qualifier skips any tests affected by the flaky console readback functions in windows 10.
The -cc[ <n>] qualifier causes all messages to include the commandcounter report. If a value is specified this is used to set the command counter triggering a breakpoint when it reaches the specified value (see %s=commandcounter). In practice:
On successful completion of all tests the script usually terminates the session but, if there are failures then the session is normally held open so you can see the report.
$ jot ${JOT_RESOURCES}/t.t -st -in="<argsList>"
Valid args:
-seed=<value> -invoke=<exeName> -script=<pathName> -allscripts=<pattern> -tests=<n> -trace=<x> -head="<commands>" -tail="<commands>" {-subprocess|-xterm|-gdb|-valgrind|-winedbg|-wine} -commandcounter=n -noloop -label -pause -nowin -crash -debug
-seed=<value> - uses the predefined seed (by default, it constructs one from current time). -invoke=<exeName> - Runs the specified executable version - by default runs same jot as wrapper process. -script=<pathName> - runs the preexisting command script then continuously repeats the same script. -allscripts=<pattern> - runs in turn, each script matching the name-string pattern (typically test*.jot_pruned). -tests=<n> - specifies number of tests to be generated in each test script - defaults to 10000. -trace=<x> - specifies trace mode (in hex) at start of test run, defaults to 6002. -head="<commands>" - specifies a command sequence to be applied after normal initiailization. -tail="<commands>" - specifies a command sequence to be applied before normal test-script exit. -subprocess - Launches test in a subprocess -xterm - Launches test in an xterm -gdb - Launches test in a subprocess monitored by gdb -valgrind - Launches test in a subprocess monitored by valgrind -winedbg - Runs the test in wineland using winedbg. -wine - Runs the test in wineland and allows it to crash. -commandcounter=n - Inserts a command to set the command counterbefore launching script in the subprocess. -noloop - exits after first test. -label - Labels each command line with %%Line <lineNo> -pause - enters debugger on completion of each pass. -nowin - Suppresses window view. -crash - Insert a command that's guaranteed to crash it. -debug - Insert commands to add various bits of debugging information to the logfile.
This constructs test scripts containing thousands of randomly-generated valid but meaningless commands. The idea is to detect any crasheyness or memory-management misbehaviour using a variety of approaches. As one script completes successfully it constructs a new one using a different random sequence of commands.
One might opt for -tests=100000 when using gdb and maybe tests=10000 with valgrind, in order to limit total run time to a few minutes. If a failure is detected in a such a big script it is necessary to reduce the size of the script to identify the cause of the failure.To do that use the error_search.jot script - this is a binary search procedure which cuts out sections of the test script until we're left with, hopefully, just a handfull of lines that can be worked on easily.
The essential qualifiers are the selectin of run harness -valgrind (detects memory-management misbehaviour) or gdb, which is used to trap and report crashes. Other harness options are xterm, which runs the script in a separate xterm or winedbg which runs in a wine debugger console and -subprocess.
At the start of day, it resets the pseudorandom generator using the current date-time stamp, the value of this seed is reflected in the name of the generated script. To regenerate a script it is possible to force it to take a predefined seed with the -seed=N modifier.
The monkey_test-generated scripts can be very large - shorter scripts are less effective since they are less likely to contain combinations of commands that might trigger some hidden data sensitivity. To identify the commands actively provoking the crash there is a script error_search.jot that does a binary search of the script, typically boiling it down to 10 lines or less. However searching very large scripts is impracticable due to the enormous search times - about 10000 tests per script is a pretty good compromise.
$ jot ${JOT_RESOURCES}/l99.t -in="%r=error_search
[-gdb|-valgrind|-xterm|-timeout[=<n>]] [ -script=<pathName>[ <pathname2>[ ...]] | [ -fromlog=<logFile>] [ -gdb[ -timeout[=<n>]] | -valgrind | -winedbg | -wine | -winecrash] [ -invoke=<name>] [ -check] [ -failif=<jotCommands>] [ -history[=n]
When a script generated by monkey_test.jot fails - either it's crashed or, when used with valgrind, it's done something naughty but not immediately fatal, revealing a data sensitive bug. By analysing the generated script we might be able to spot what's going on, maybe with the aid of gdb or a similar debugger.
The -script=<scriptName>... qualifier can be used to specify any number of scripts to be processed. Alternatively, when working with monkey_test.jot-generated scripts, it can be passed the monkey-test logfile and it will pick up the failing scripts from there.
By default, the test sessions are launched as a simple subprocess, the -xterm qualifier launches it in an xterm so you can keep an eye on what's happening, the -gdb qualifier launches the test monitored by gdb session and -valgrind monitors with valgrind, the -timeout qualifier is used to identify lines which cause the script to hang.
The timeout qualifier runs the script with the specified timeout (defaults to 600), denominated in seconds, using the unix timeout command.
The -history[=n] qualifier preserves the last n versions of the boiled script by suffixing the script name with _1, _2 ... _n.
The script is first run in it's entierity to verify that the failure is detectable by error_search.jot. It then conducts a binary search for the test-script lines causing the failure. At each stage it removes a block of lines from the script and re runs the remainder. If it no longer fails then there was something in the removed section that enables the failure and the removed block of lines is restored. It then makes it's way through the script, on completion of each pass of the script the block size is halved and the search resumes at the beginning of the script. The search terminates after it completes a full pass with a block size of 1 line.
The error_search script detects that it misbehaved by searching the reply from the child session. You can specify a different failure criterion by specifing your own comparator commands in the -failif modifier.
Often it's not that obvious what's going on and it would be useful to know exactly which commands are trigger the error. Now, it's trivial to find which was the last command before it crashed but, several earlier commands were involved in creating the conditions for the crash. Normally, it's less than a dozen or so commands out of the many thousands of randomly-chosen commands. The error_search.jot script performs a binary search, slicing down the generated script and repeating with progressively smaller slices, until it finds the minimum required to provoke the error.
Demonstration of error_search
$ jot t.t -st -ini="%r=monkey_test -crash -tests=1000 -seed=1234567890"
$ jot t.t -in="%r=error_search -gdb -script=test1234567890.jot"
o@ol123 oo/%n/ %%Crash now.
Typical usage:
$ ./jot t.t -in="%r=error_search -valgrind -script=./test108110917.jot"
The error_search.jot script runs your test script (in this case test108110917.jot) and, if the full script results in valgrind error reports, the script is whittled down to the minimum set of commands that still results in valgrind errors.
$ test.sh[ -test n][ -exe <executable>] $ test.bat[ -test n][ -exe <executable>][ -nosetup][ -hold][ -nobble][ -clearout]
These scripts have been superseded by test_all.jot
These two scripts are similar, they run identical tests. test.sh runs tests on the linux version in a bash shell. test.bat runs the same tests on the windows version in a windows (or wine) console. Unfortunately, the effort required to keep test.bat aligned with test.sh proved too be much.
Some simple tests for the linux version in simulated real-life situations.
Although the tests are almost functionally identical, there was a certain amount of wriggling to get test.bat to work reliably in the wineconsole:
$echo "This console is OK" | grep "This console is OK"
The test.bat contains a test to detect this situation and exits early if a dud console has been detected.
$ jot -in="%r=test_all[ -exe <executable>][ -nosize][ -test <n>[ <n>[ ...]]][ -debug 1]
It runs the following tests:
The main jot session is a harness running regression tests on some version of jot - typically a development version. By default it tests a version named jot_dev, which must be on your search path.
test_all.jot is a replacement for test.sh and test.bat, maintenence of these two scripts was a nightmare and, it is hoped, this script will provide a more consistent and reliable test for future versions of jot.