wiki:NightlyBuilds
Warning: Can't synchronize with repository "(default)" (/disks/svn_mcs/repos/pvfs does not appear to be a Subversion repository.). Look in the Trac log for more information.
Last modified 10 years ago Last modified on 03/25/09 08:39:57

Machines

Our nightly tests run on a handful of test boxes:

lain
debian testing. x86
stan
fedora 9, x86. Always tests --enable-strict
fin
fedora 9, x86-64
jazz
RHEL 3, Myrinet GM
breadboard
(infrequently run) x86-64 cluster

Every night these machines check out several branches of the PVFS tree and run a series of tests.

Tests

Tests are little shell stubs under test/automated/nightly:

vfs-tests.d
tests that use the kernel interface
sysint-tests.d
local system-interface tests. Bypass the kernel.
mpiio-tests.d
assorted MPI-IO tests. These can only run on Jazz for now.

The nightly test infrastructure sources each test script in a "for f in $dir" sort of idiom, and keeps a counter of how many tests were run. A test either 100% succeeds or 100% fails, so there's no way for example to report back "5 of 350 fsx patterns were bad". You instead get "fsx failed" and then you go dig in the logs for what exactly happened. The test harness checks the return code ( the shell variable '$?' ) of the test script. $0 means it is OK, anything else means it FAILED It's easiest from the test harness's perspective if a test stub compiles and executes the test of interest and then either returns the exit status of that test (if it returns 0 on "all pass") or greps through the logfile for an "all clear" condition.

Several of the MPI-IO tests grep through the logfile for the absence of failure conditions, but this is pretty error prone and ends up being hard to get 100% correct.

Results

Results are posted to a tinderbox page, one column for each "machine-branch" combination. Entries are of the form "XX of YY failed". You want XX to be zero, of course.

Green means everything passed. Orange means at least one failure. Purple means there were some warnings during compilation.

Click on the 'l', 'L', or 'C' to get dump of all the test logs. In the 100% success case, there won't be much to see, but if there were any failures they should end up on that page. If you find that you have to log into the test machine, then we should fix the tests so that's not needed.

Setting Up Your Own Tests

A quick word: You might want to override the defaults, which you can do by modifying nightly-tests.cfg (also in the 'nightly' directory).

System Interface Tests

The easiest tests to run are the system interface tests (the ones in sysint.d). These tests are also the ones that will run "out of the box" if you just run test/automated/nightly/testscrpt.sh.

VFS Tests

Next up in difficulty: the VFS tests. These tests require two things.

First, root access. The test scripts assume passwordless sudo, which isn't a problem on our dedicated test machines (no sensitive items there) but might be a concern on production machines. Edit testscrpt.sh and add your machine to the list of VFS_HOSTS.

Second, some external benchmarks. Several of the VFS tests compile and run some common I/O benchmarks (IOzone, bonnie++, dbench). I have put a tarball of the tests we run in http://www.mcs.anl.gov/~robl/pvfs2/benchmarks-20060512.tar.gz

You will have to update your dbench for modern compilers. I haven't updated the benchmarks tarball with this newer version yet.

But once you have all that, then I think this too should work without too much mucking around inside the test scripts.

MPI-IO Tests

These tests are less complicated in some ways -- they do not require root access -- but are still more complicated. I've made no end of assumptions to get these tests to work on our Jazz cluster. These scripts as written pretty much require PBS.

Here's the general idea: Allocate some nodes (the scripts allocate 8 nodes from Jazz's shared queue, because that's the most nodes with the fastest turnaround). We use PAV (pvfs auto-volume) to partition the allocated nodes into I/O nodes (those running pvfs servers) and clients (the ones participating in the MPI job).

The specifics include: queue name, executing an MPI job on a subset of PBS nodes, configuring PAV for our system. I can and should write a lot more about this part...