Version 38 (modified by tramer, 13 years ago) (diff) |
---|
Installing Babel on BG/P (Surveyor at Argonne)
Current build instructions (Notes by Stephen Tramer [email protected]…)
It is strongly recommended that you make a directory on /scratch to perform your build in. Boyana mentions in her notes below that configure/make takes a long time; this is because of the slow disk access to $HOME on login nodes for surveyor. Configure and builds proceed much faster on /scratch. However, note that you cannot run the regression tests from /scratch. This is because /scratch is specific to each login node, and is not mounted for any BG/P nodes.
These notes very closely follow Boyana's, but some of her fixes have now been patched into the code. For the full details of what has been modified in order to support BG/P, please see below.
These instructions will enable support for all languages except java (irrelevant on BG/P, since it doesn't have java). Throughout $BABEL_SRC will refer to your local babel source copy and $BABEL_INSTALL to the directory where babel will be installed. Please be aware that this build process has been tested only in the development branch, for the proposed babel 1.5.0.
- Build a local version of libxml2 (if you want) and also libtool (if you like). There are versions available at /home/tramer/libxml2-bgp and /home/tramer/libtool-dev-bgp respectively, compiled for use with applications being compiled for BG/P. However, it is recommended that you roll your own to become familiar with some issues with cross-compiling using the GNU autotools. See Boyana's instructions below for libxml2; for libtool you will need to run 'make' first and then the libtool_fix.sh script.
- Configure the project:
$BABEL_SRC/configure CC=bgxlc CXX=bgxlC FC=bgxlf90 F77=bgxlf --prefix=$BABEL_INSTALL_DIR --host=powerpc-bgp-linux-gnu \ --without-sidlx --with-gcc=no --enable-python=/bgsys/drivers/ppcfloor/gnu-linux/bin/python \ --with-libxml2=$LIBXML2_DIR LDFLAGS=-qnostaticlink --enable-shared --disable-static \ host_alias=powerpc-bgp-linux-gnu --target=powerpc64-ibm-bgp --with-ltdl-lib=$LIBTOOL_DIR/lib \ --with-ltdl-include=$LIBTOOL_DIR/include --without-libparsifal --disable-fortran90
You may also use the mpixl* compilers if you wish. However since babel does not use MPI, this is not necessary. Currently --disable-fortran90 is only needed because struct support is currently broken in babel; it should be safe to add it back in. Also note that you can build the convenience ltdl contained within babel itself - use the external libtool only if this doesn't work.
If the configure fails, run aclocal, autoconf, and automake in both $BABEL_SRC and $BABEL_SRC/runtime. Note that you will need autoconf 2.64 and automake 1.10.x; these are not available on the login machines and you will have to build them yourself (but do not need to port them to BG/P; they will only be used for building).
If your configure fails, please look at Boyana's instructions below and examine your output to determine where the mistake could be. However, note that you should not include '-G' in the LDFLAGS. This flag is for building libraries only; providing it to an executable build will cause severe problems.
- Run $BABEL_SRC/contrib/libtool_fix.sh in the $BABEL_SRC directory. This will patch your libtool files so that they are compatible with the BG/P compilers. Do not skip this step. If step 4 fails, make sure you have done this.
- 'make clean all install'
If your make fails, please look at Boyana's instructions below and examine your output to determine where the mistake could be.
Running regression tests
Again, you cannot run regression tests from /scratch. Trying to do so will cause headaches and a complete lack of output from mpirun. Build your babel source in your home directory (or another directory mounted by the BG/P) in order to run regression tests.
Note that currently there is no automation for running babel tests on the BG/P; you will have to run them by hand. Therefore it is recommended that you only run a handful; the struct and array tests are most useful since they involve data alignment and can be used to detect any issues of that sort. You can run a regression test by:
- Edit the libtool convenience script for 'runAll' in the test of your choice:
[email protected]:~/babel/trunk/regression/arrays/runC> diff runAll runAll.cobalt 135c135,138 < exec "$progdir/$program" ${1+"[email protected]"} --- > if [ "X${PYTHONPATH}" != "X" ]; then > ADDITIONAL_ARG="PYTHONPATH=${PYTHONPATH}" > fi > cobalt-mpirun -mode vn -np 1 -verbose 2 -cwd `pwd` \ > -env "LD_LIBRARY_PATH=${LD_LIBRARY_PATH} SIDL_DLL_PATH=${SIDL_DLL_PATH} $ADDITIONAL_ARG" \ > "$progdir/$program" ${1+"[email protected]"}
- 'qsub --mode script -A cca-tools -t 15 -n 1 runAll.sh'
- Once the job completes on the BG/P nodes, you will have to manually examine the test output for failures. Please note that tests will (erroneously) marked as BROKEN due to the return value of cobalt-mpirun from the libtool convenience script. Actual broken tests can only be checked by examining the output and error logs.
Old build instructions (Notes by Boyana Norris [email protected]…)
I do not recommend that anyone tries this unless they want to spend a lot of time waiting for configure scripts to complete (e.g., on my laptop, babel configure takes 2.5 minutes, while on the BG/P it takes at least 20 minutes, depending on various options, and then you have to actually build).
- First of all, I started out with the goal of getting as many languages supported as possible, including python. This necessitated the following steps:
- Python distutils are rather crippled when it comes to cross-compiling. I built my own python (version 2.6.1) and extended the distutils implementation with a BG/P-specific compiler module. More details on that are available here.
- I had to install NumPy , which turned out to be nontrivial (see description). My second requirement was support for shared libraries since fully static builds are (1) less flexible; and (2) not quite fully automated for CCA components. There is also the additional danger of running out of memory because of bloated static executables. GNU Libtool does not work out of the box for building shared libraries on the BG/P -- more on that later.
- As expected, GNU Autotools-based packages cannot be simply built on BG/P. First, configure does not provide good cross-compilation support, at least not good enough for BG/P. I added these settings to the llnl_cross_compiling.m4 file (right after the case "$target" in line):
powerpc64-ibm-bgp*) cross_compiling=yes llnl_cross_compiling_okay=yes enable_pure_static_runtime=no enable_shared=yes enable_static=no enable_java=no enable_python=/bgsys/drivers/ppcfloor/gnu-linux/bin/python llnl_cv_python_frontend=python llnl_cv_python_prefix=/bgsys/drivers/ppcfloor/gnu-linux llnl_cv_python_numpy=yes llnl_cv_python_numerical=no llnl_cv_python_library=$llnl_cv_python_prefix/lib llnl_cv_python_version=2.5 llnl_cv_python_include=$llnl_cv_python_prefix/include/python$llnl_cv_python_version llnl_cv_extra_python_build_options="--compiler=mpixlc" llnl_python_shared_library=$llnl_cv_python_library/libpython2.5.so llnl_python_shared_library_found=yes sidl_cv_f77_false=0 sidl_cv_f77_true=1 sidl_cv_f90_false=0 sidl_cv_f90_true=1 llnl_cv_F77_logical_size=4 llnl_cv_F90_logical_size=4 ac_cv_f90_pointer_size=8 llnl_cv_F77_string_passing="far int32_t" llnl_cv_F90_string_passing="far int32_t" ac_cv_func_malloc_0_nonnull=yes ac_cv_func_realloc_0_nonnull=yes ac_cv_func_memcmp_working=yes with_sidlx=no ;;
Some additional hard-coded configure settings are discussed in this IBM Document. (Note: A host triplet used for identifying compute nodes (used by IBM)seems to be powerpc-bgp-linux (not the one above).)
Next I changed the runtime/m4/llnl_confirm_babel_python.m4 file as follows:
*** llnl_confirm_babel_python_support.m4 2008-10-21 12:05:31.000000000 -0500 --- /home/norris/babel-1.4.0/runtime/m4/llnl_confirm_babel_python_support.m4 2009-02-24 22:38:39.246831463 -0600 *************** *** 38,47 **** --- 38,49 ---- fi if test "X$enable_python" != "Xno"; then + if test "X$cross_compiling" != "Xyes"; then LLNL_PYTHON_LIBRARY LLNL_PYTHON_NUMERIC LLNL_PYTHON_SHARED_LIBRARY LLNL_PYTHON_AIX + fi if test \( "X$llnl_cv_python_numerical" != "Xyes" -a "X$llnl_cv_python_numpy" != "Xyes" \) -o "X$enable_shared" = "Xno" -o "X$XML2_CONFIG" = "Xno"; then enable_python=no; AC_MSG_WARN([Configuration for Python failed. Support for Python disabled!])
(NOTE from Tramer: This change was not necessary for the new BG/P cross-compilation, and in fact broke the build.)
Finally, I modified runtime/config/config.sub as follows:
*** config.sub 2008-10-21 11:42:24.000000000 -0500 --- /home/norris/babel-1.4.0.bk/runtime/config/config.sub 2009-02-23 13:23:35.376490273 -0600 *************** *** 1391,1396 **** --- 1391,1399 ---- -bgl) os=-bgl ;; + -bgp) + os=-bgp + ;; -catamount) os=-catamount ;;
While building prerequisites (see next item), I found that libtool did not support building and linking shared libraries using IBM's compilers. The perfect fix for this problem is to add BG/P support to libtool. The faster workaround was to first set an environment variable:
export LDFLAGS='-G -qnostaticlink'
then run configure. Next we introduce an extra step before running make by using this little script on all generated libtool scripts in the build directory of the package being built
#!/bin/sh #File: libtool_fix.sh wd=`pwd` if test "x$1" = "x" ; then prog=libtool else prog=$1; fi find $wd -name $prog -exec \ sed -i1 -e 's|^export_dynamic_flag_spec=.*$|export_dynamic_flag_spec=""|g' \ -e 's|^whole_archive_flag_spec=.$|whole_archive_flag_spec=""|g' \ -e 's|^pic_flag=.*$|pic_flag=" -DPIC -qpic"|g' \ -e 's|^archive_cmds=.*$|archive_cmds="\\$CC -G \\$libobjs \\$deplibs \\$compiler_flags -qmkshrobj -e \\$soname -o \\$lib"|g' \ -e 's|^hardcode_libdir_flag_spec=.*$|hardcode_libdir_flag_spec="-R\\$libdir -L\\$libdir"|g' {} \; -print
- To successfully configure Babel, the following prerequisites were necessary: libxml2 and zlib. The former was required for Python support (so the fact that Babel includes libparsifal does not remove the dependence on libxml2 if you want Python). The latter was required by libxml2.
- zlib-1.2.3 was configured with:
./configure --prefix=/home/norris/cca
and then I had to manually change the values of the CC and LDSHARED variables since zlib does not have a real configure. The values were CC=mpixlc and LDSHARED=mpixlc (I think you only really need CC).
- libxml2-2.7.3 was configured and built with
./configure --prefix=/home/norris/soft/libxml2-2.7.3 CC=mpixlc_r CXX=mpixlcxx_r F90=mpixlf2003_r F77=mpixlf77_r \ CPP='mpixlc_r -E' --with-python=/bgsys/drivers/ppcfloor/gnu-linux/bin/python --with-zlib=/home/norris/soft/zlib-1.2.3 \ --enable-shared --disable-static LDFLAGS='-G -qnostaticlink' --with-libs='-ldl' libtool_fix.sh # In config.h, make sure HAVE_DLOPEN is defined, and HAVE_SHLOAD is undefined make make install
- zlib-1.2.3 was configured with:
To test, do not use the libtool-generated wrappers scripts, but run the actual executables directly in the .libs subdirectory, e.g.:
cd .libs qsub -A cca-tools -t 60 -n 1 -O testapi --env LD_LIBRARY_PATH=/gpfs/home/norris/soft/libxml2-2.7.3/lib ./testapi qsub -A cca-tools -t 60 -n 1 -O runtest --env LD_LIBRARY_PATH=/gpfs/home/norris/soft/libxml2-2.7.3/lib ./runtest
- Miscellaneous:
To make sure that something is really compiled for the compute nodes, check it with
file executable
The output should look something like this:
ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV), for GNU/Linux 2.0.0, dynamically linked (uses shared libs), for GNU/Linux 2.0.0, not stripped