JOT Tests

This document describes the pre-release tests used to verify a new version of jot. These are regression tests - they cannot prove that a version of jot is correct, they just test for regressions in specific areas.

Typically, a new version of jot will be assigned the name jot_dev (or jot_dev.exe for a windows version).

test.jot

$ ./jot_dev ${JOT_RESOURCES}/t.t -in=%r=test

This runs basic regression tests on all jot commands and command variants. It is totally autonomous and, on completion, should display the message

"Successfully completed all tests"

test_visual.jot

$ ./jot_dev t.t -in=%r=test_visual[ -debug][ -nosize][ -nounicode]

[ -win10relaxation][ -fromtest <n>][ -break <n>][ -cc[ <n>]][ -exit ]

-debug - enters the jot debugger at start of comparison stage of each test.

-nosize - ignores size of terminal nb some tests *are* very much dependant on this.

-nounicode - skips tests involving unicode.

-win10relaxation - skips tests affected by windows10 bug.

-cc[ <n>] - Sets up the jot command counter and displays command counter in messages.

-exit - On failure, exits with failure status (by default it holds the session open).

-noexit - Always holds the session open.

This runs a series of tests designed to detect any regression in the way it drives the screen. In olden times this was literally a visual test - the intrepid tester had to sit and check each several dozen screenshots to make sure nothing was going wrong. These days, though, it's all done automatically using the query screen command.

Each of these test (currently there are 127 tests) perfoms some manipulation of the text and checks that the screen has been correctly updated. It uses the Query screen report which, in turn relies on the OS functions getcchar() in linux (actually part of the ncurses system) and ReadConsoleOutputCharacter() in winndows). While the linux version is totally reliable the windows version is decidedly flakey and, strangely, windows 10 seems worse than older versions.

The -debug qualifier inserts a breakpoint trap immediately after reading back the window status, it also redefines the window configuration to make it easier to see what's going on..

Some tests require an 80*40 terminal, others are less demanding, the -nosize skips the terminal-size check.

Some linux terminals do not support unicode, if this is true of the current terminal the -nounicode qualifier will skip these tests.

The -win10relaxation qualifier skips any tests affected by the flaky console readback functions in windows 10.

The -cc[ <n>] qualifier causes all messages to include the commandcounter report. If a value is specified this is used to set the command counter triggering a breakpoint when it reaches the specified value (see %s=commandcounter). In practice:

first run the tests with -cc but no command-counter value, this will cause it to report the command counter with each message, including the list of failed tests then

in the final summary report, read off the command counter value for the failure you want to investigate and, finally,- re-spin the same tests but with the -cc arg set to the failed command counter.

On successful completion of all tests the script usually terminates the session but, if there are failures then the session is normally held open so you can see the report.

the -exit qualifier causes it to exit even if there are failures it exiytts with a failure message and an error status code.

the -noexit qualifier forces it to hold the session open, even if there were no failed tests.

monkey_test.jot

$ jot ${JOT_RESOURCES}/t.t -st -in="<argsList>"

Valid args:

-seed=<value> -invoke=<exeName> -script=<pathName> -allscripts=<pattern> -tests=<n> -trace=<x> -head="<commands>" -tail="<commands>" {-subprocess|-xterm|-gdb|-valgrind|-winedbg|-wine} -commandcounter=n -noloop -label -pause -nowin -crash -debug

-seed=<value> - uses the predefined seed (by default, it constructs one from current time). -invoke=<exeName> - Runs the specified executable version - by default runs same jot as wrapper process. -script=<pathName> - runs the preexisting command script then continuously repeats the same script. -allscripts=<pattern> - runs in turn, each script matching the name-string pattern (typically test*.jot_pruned). -tests=<n> - specifies number of tests to be generated in each test script - defaults to 10000. -trace=<x> - specifies trace mode (in hex) at start of test run, defaults to 6002. -head="<commands>" - specifies a command sequence to be applied after normal initiailization. -tail="<commands>" - specifies a command sequence to be applied before normal test-script exit. -subprocess - Launches test in a subprocess -xterm - Launches test in an xterm -gdb - Launches test in a subprocess monitored by gdb -valgrind - Launches test in a subprocess monitored by valgrind -winedbg - Runs the test in wineland using winedbg. -wine - Runs the test in wineland and allows it to crash. -commandcounter=n - Inserts a command to set the command counterbefore launching script in the subprocess. -noloop - exits after first test. -label - Labels each command line with %%Line <lineNo> -pause - enters debugger on completion of each pass. -nowin - Suppresses window view. -crash - Insert a command that's guaranteed to crash it. -debug - Insert commands to add various bits of debugging information to the logfile.

This constructs test scripts containing thousands of randomly-generated valid but meaningless commands. The idea is to detect any crasheyness or memory-management misbehaviour using a variety of approaches. As one script completes successfully it constructs a new one using a different random sequence of commands.

One might opt for -tests=100000 when using gdb and maybe tests=10000 with valgrind, in order to limit total run time to a few minutes. If a failure is detected in a such a big script it is necessary to reduce the size of the script to identify the cause of the failure.To do that use the error_search.jot script - this is a binary search procedure which cuts out sections of the test script until we're left with, hopefully, just a handfull of lines that can be worked on easily.

The essential qualifiers are the selectin of run harness -valgrind (detects memory-management misbehaviour) or gdb, which is used to trap and report crashes. Other harness options are xterm, which runs the script in a separate xterm or winedbg which runs in a wine debugger console and -subprocess.

At the start of day, it resets the pseudorandom generator using the current date-time stamp, the value of this seed is reflected in the name of the generated script. To regenerate a script it is possible to force it to take a predefined seed with the -seed=N modifier.

The monkey_test-generated scripts can be very large - shorter scripts are less effective since they are less likely to contain combinations of commands that might trigger some hidden data sensitivity. To identify the commands actively provoking the crash there is a script error_search.jot that does a binary search of the script, typically boiling it down to 10 lines or less. However searching very large scripts is impracticable due to the enormous search times - about 10000 tests per script is a pretty good compromise.

error_search.jot

$ jot ${JOT_RESOURCES}/l99.t -in="%r=error_search

When a script generated by monkey_test.jot fails - either it's crashed or, when used with valgrind, it's done something naughty but not immediately fatal, revealing a data sensitive bug. By analysing the generated script we might be able to spot what's going on, maybe with the aid of gdb or a similar debugger.

The -script=<scriptName>... qualifier can be used to specify any number of scripts to be processed. Alternatively, when working with monkey_test.jot-generated scripts, it can be passed the monkey-test logfile and it will pick up the failing scripts from there.

By default, the test sessions are launched as a simple subprocess, the -xterm qualifier launches it in an xterm so you can keep an eye on what's happening, the -gdb qualifier launches the test monitored by gdb session and -valgrind monitors with valgrind, the -timeout qualifier is used to identify lines which cause the script to hang.

The timeout qualifier runs the script with the specified timeout (defaults to 600), denominated in seconds, using the unix timeout command.

The -history[=n] qualifier preserves the last n versions of the boiled script by suffixing the script name with _1, _2 ... _n.

The script is first run in it's entierity to verify that the failure is detectable by error_search.jot. It then conducts a binary search for the test-script lines causing the failure. At each stage it removes a block of lines from the script and re runs the remainder. If it no longer fails then there was something in the removed section that enables the failure and the removed block of lines is restored. It then makes it's way through the script, on completion of each pass of the script the block size is halved and the search resumes at the beginning of the script. The search terminates after it completes a full pass with a block size of 1 line.

The error_search script detects that it misbehaved by searching the reply from the child session. You can specify a different failure criterion by specifing your own comparator commands in the -failif modifier.

Often it's not that obvious what's going on and it would be useful to know exactly which commands are trigger the error. Now, it's trivial to find which was the last command before it crashed but, several earlier commands were involved in creating the conditions for the crash. Normally, it's less than a dozen or so commands out of the many thousands of randomly-chosen commands. The error_search.jot script performs a binary search, slicing down the generated script and repeating with progressively smaller slices, until it finds the minimum required to provoke the error.

Demonstration of error_search

First generate a script with a randomly-placed error, the -crash option is only used to generate a crashing script for this purpose:

$ jot t.t -st -ini="%r=monkey_test -crash -tests=1000 -seed=1234567890"

Now search the script for line(s) causing the crash:

$ jot t.t -in="%r=error_search -gdb -script=test1234567890.jot"

This should generate the script test1234567890.jot_pruned - a copy of your original script containing only the lines contribution to the error condition - in this case just the line inserted by the -crash option to monkey_test:

o@ol123 oo/%n/   %%Crash now.

Typical usage:

$ ./jot t.t -in="%r=error_search -valgrind -script=./test108110917.jot"

The error_search.jot script runs your test script (in this case test108110917.jot) and, if the full script results in valgrind error reports, the script is whittled down to the minimum set of commands that still results in valgrind errors.

test.sh and test.bat

$ test.sh[ -test n][ -exe <executable>]
$ test.bat[ -test n][ -exe <executable>][ -nosetup][ -hold][ -nobble][ -clearout]

These scripts have been superseded by test_all.jot

These two scripts are similar, they run identical tests. test.sh runs tests on the linux version in a bash shell. test.bat runs the same tests on the windows version in a windows (or wine) console. Unfortunately, the effort required to keep test.bat aligned with test.sh proved too be much.

Some simple tests for the linux version in simulated real-life situations.

Test 1 - test.jot

Test 2 - Streaming out to stdout

Test 3 - Accepting input from stdin stream - in -tty mode.

Test 3a - Accepting input from stdin stream - in non-tty mode.

Test 4 - Accepting input from stream and piping to stdout - in -tty mode.

Test 5 - Read from command, -tty mode

Test 5a - Read from command - in non-tty mode.

Test 6 - test_visual.jot

Test 7 - streamed-in commands -tty mode

Test 7a - streamed-in commands - in non-tty mode

Test 8 - journal-keeping and recovery

Test 9 - -asConsole

Test 10 - Tests operation of -asConsole qualifier to %R command.

Test 11 - get.jot - tests the get.jot script in the most important configurations.

Test 12 - Tests insert-mode operation

Test 13 - Tests recover.jot with some tricky situations.

Test 14 - Tests a few important functions defined by the startup script.

Although the tests are almost functionally identical, there was a certain amount of wriggling to get test.bat to work reliably in the wineconsole:

Tests requiring a 40*80 console cause it to wait in a jot session while you streach the window to the required size.

In wine (but not in genuine windows) running jot sets the console to a funny state where it will not redirect streams. e.g: this will wirk as expected until jot is run but then fails:

$echo "This console is OK" | grep "This console is OK"

The test.bat contains a test to detect this situation and exits early if a dud console has been detected.

Test 2 checks jots aility to stream to stdout and has been moved to become the chronological first test, but it retains the tesignation test2. This is because test1 will set wineconsoles to a state where tis test will *always* fail.

test_all.jot

$ jot -in="%r=test_all[ -exe <executable>][ -nosize][ -test <n>[ <n>[ ...]]][ -debug 1]

-exe <ExecutableName> - specifies the executable to be tested - defaults to the executable running the test_all harness.

-nosize - ignores size of terminal nb some tests are *very* much dependant on this: - tests 1, 7 and 15. If started with an undersize screen, the default action is to prompt for you to stretch the screen to at least 80 columns by 40 lines.

-noexit - stops it closing the session on completion of the last test.

-test <n1>[ <n2>[ ...]] - list of tests to be done - defaults to all tests.

-debug=<hexMask> - debugging of this script. The hexmask bits are assigned as follows:

1 - Retains existing test files - e.g. test.txt, test.jot, test.bat, test.ah and test.ref

It runs the following tests:

Test 1 -- Runs test.jot - the jot-command regression test;

Test 2 -- Streaming out to stdout;

Test 3 -- Accepting input from stdin stream - in -tty mode.;

Test 4 -- Accepting input from stdin stream - in non-tty mode.;

Test 5 -- Accepting input from stream and piping to stdout - in -tty mode.;

Test 6 -- Read from %E command.;

Test 7 -- Runs test_visual.jot

Test 8 -- streamed-in commands.;

Test 9 -- streamed-in commands - in non-tty mode.;

Test 10 -- piping to stdout in -tty mode.;

Test 11 -- -asConsole scripts

Test 12 -- basic journal-keeping and recovery test - recover.jot

Test 13 -- A more detailed look at recover.jot, including some tricky situations.

Test 14 -- get.jot.

Test 15 -- A selection of functions defined in startup.jot

Test 16 -- big_file.jot

Test 17 -- Verifies basic functionallity of fake_vim.jot

Test 18 -- Verifies basic functionallity of fake_emacs.jot

Test 19 -- Verifies basic functionallity of fake_nano.jot

The main jot session is a harness running regression tests on some version of jot - typically a development version. By default it tests a version named jot_dev, which must be on your search path.

test_all.jot is a replacement for test.sh and test.bat, maintenence of these two scripts was a nightmare and, it is hoped, this script will provide a more consistent and reliable test for future versions of jot.