wiki:babelbgp
Last modified 9 years ago Last modified on 01/15/10 13:52:36

Installing Babel on BG/P (Surveyor at Argonne)

Please note that if you are a member of the cca-tools project on surveyor, you do not need to build babel. It is available to you at /gpfs/home/projects/ccatools/babel. In fact, even if you are not a member of the cca-tools project, it is available to you.

Cut-and-paste

If following these directions it is expected that you are using the libraries available from /home/projects/ccatools that are listed below. You can always choose to follow Boyana's instructions instead.

export LIBXML2_LOC=/home/projects/ccatools/libxml2
export LIBTOOL_LOC=/home/projects/ccatools/libtool

Building and installing Babel, make sure that you have INSTALL_DIR set to the directory you want to install to:

configure CC=mpixlc_r CXX=mpixlcxx_r FC=mpixlf90_r F77=mpixlf77_r --prefix=$INSTALL_DIR --host=powerpc-bgp-linux-gnu --without-sidlx \
  --with-gcc=no --enable-python=/bgsys/drivers/ppcfloor/gnu-linux/bin/python --with-libxml2=$LIBXML2_LOC \
  LDFLAGS=-qnostaticlink --enable-shared --disable-static host_alias=powerpc-bgp-linux-gnu --target=powerpc64-ibm-bgp \
  --with-ltdl-lib=$LIBTOOL_LOC/lib --with-ltdl-include=$LIBTOOL_LOC/include --without-libparsifal \
  PYTHONPATH=/soft/apps/python/python-2.6-cnk-gcc/numpy-1.3.0/lib/python2.6/site-packages CPP="gcc -E" \
  target_alias=powerpc64-ibm-bgp --program-prefix=""
./contrib/libtool_fix.sh
make clean all install

Current build instructions (Notes by Stephen Tramer tramer@…)

It is strongly recommended that you make a directory on /scratch to perform your build in. Boyana mentions in her notes below that configure/make takes a long time; this is because of the slow disk access to $HOME on login nodes for surveyor. Configure and builds proceed much faster on /scratch. However, note that you cannot run the regression tests from /scratch. This is because /scratch is specific to each login node, and is not mounted for any BG/P nodes.

These notes very closely follow Boyana's, but some of her fixes have now been patched into the code. For the full details of what has been modified in order to support BG/P, please see below.

These instructions will enable support for all languages except java (irrelevant on BG/P, since it doesn't have java). Throughout $BABEL_SRC will refer to your local babel source copy and $BABEL_INSTALL to the directory where babel will be installed. Please be aware that this build process has been tested only in the development branch, for the proposed babel 1.5.0.

  1. Prerequisites. You will need to build a position-independent libz, libxml2, and the latest libtool development branch (which has some limited BG/P) support in it for BG/P.

If you would like to avoid building either prerequisite, they are available for you at:

libz: /home/projects/ccatools/libz-1.2.3
libxml2: /home/projects/ccatools/libxml2
libtool: /home/projects/ccatools/libtool

1a. Building libz and libxml2 can be done by following Boyana's old instructions below.

1b. libtool can be built by doing the following:

  • git clone git://git.savannah.gnu.org/libtool.git OR download the latest nightly from http://pogma.com/libtool/.
  • Set LIBTOOL_LOCATION to your install directory, and configure:
    ./configure CC=bgxlc_r CXX=bgxlC_r F77=bgxlf_r FC=bgxlf90_r LDFLAGS="-qnostaticlink -r" --enable-shared \
      --disable-static --enable-ltdl-install --prefix=$LIBTOOL_LOCATION --host=powerpc-bgp-linux-gnu CFLAGS=-qpic
    
  • 'make all'
  • At this point your build will halt. You need to run the contrib/libtool_fix.sh script from the babel project to patch the bootstrapped libtool script.
  • 'make all install'
  1. Configure the project from $BABEL_SRC:
configure CC=mpixlc CXX=mpixlcxx FC=mpixlf90 F77=mpixlf77 --prefix=$INSTALL_DIR --host=powerpc-bgp-linux-gnu --without-sidlx \
  --with-gcc=no --enable-python=/bgsys/drivers/ppcfloor/gnu-linux/bin/python --with-libxml2=$LIBXML2_LOCATION \
  LDFLAGS=-qnostaticlink --enable-shared --disable-static host_alias=powerpc-bgp-linux-gnu --target=powerpc64-ibm-bgp \
  --with-ltdl-lib=$LIBTOOL_LOCATION/lib --with-ltdl-include=$LIBTOOL_LOCATION/include --without-libparsifal \
  PYTHONPATH=/soft/apps/python/python-2.6-cnk-gcc/numpy-1.3.0/lib/python2.6/site-packages CPP="gcc -E" \
  target_alias=powerpc64-ibm-bgp

If the configure fails, run aclocal, autoconf, and automake in both $BABEL_SRC and $BABEL_SRC/runtime. Note that you will need autoconf 2.64 and automake 1.10.x; these are not available on the login machines and you will have to build them yourself (but do not need to port them to BG/P; they will only be used for building).

If your configure fails, please look at Boyana's instructions below and examine your output to determine where the mistake could be. However, note that you should not include '-G' in the LDFLAGS. This flag is for building libraries only; providing it to an executable build will cause severe problems.

  1. Run $BABEL_SRC/contrib/libtool_fix.sh in the $BABEL_SRC directory. This will patch your libtool files so that they are compatible with the BG/P compilers. Do not skip this step. If step 4 fails, make sure you have done this.
  1. 'make clean all install'

If your make fails, please look at Boyana's instructions below and examine your output to determine where the mistake could be.

Running regression tests

Again, you cannot run regression tests from /scratch. Trying to do so will cause headaches and a complete lack of output from mpirun. Build your babel source in your home directory (or another directory mounted by the BG/P) in order to run regression tests.

Note that currently there is no automation for running babel tests on the BG/P; you will have to run them by hand. Therefore it is recommended that you only run a handful; the struct and array tests are most useful since they involve data alignment and can be used to detect any issues of that sort. You can build and fix the regression tests with:

  1. 'make check' will build the regression tests. Expect everything to fail spectacularly in the gantlet run.
  1. Run $BABEL_SRC/contrib/regression_fix.sh to patch the regression files.
  1. 'qsub --mode script -A cca-tools -t 15 -n 1 X' where 'X' represents your test. For most cases this should be 'runAll.sh'; for python client tests you will be running a specific script. Note that the Py2Py tests are broken on BG/P and we haven't been able to figure out why yet - in fact, all loading from SCL files seems to be something of a crapshoot.
  1. Once the job completes on the BG/P nodes, you will have to manually examine the test output for failures. Please note that tests will (erroneously) marked as BROKEN due to the return value of cobalt-mpirun from the libtool convenience script. Actual broken tests can only be checked by examining the output and error logs.

Old build instructions (Notes by Boyana Norris norris@…)

I do not recommend that anyone tries this unless they want to spend a lot of time waiting for configure scripts to complete (e.g., on my laptop, babel configure takes 2.5 minutes, while on the BG/P it takes at least 20 minutes, depending on various options, and then you have to actually build).

  1. First of all, I started out with the goal of getting as many languages supported as possible, including python. This necessitated the following steps:
    1. Python distutils are rather crippled when it comes to cross-compiling. I built my own python (version 2.6.1) and extended the distutils implementation with a BG/P-specific compiler module. More details on that are available here.
    2. I had to install NumPy , which turned out to be nontrivial (see description). My second requirement was support for shared libraries since fully static builds are (1) less flexible; and (2) not quite fully automated for CCA components. There is also the additional danger of running out of memory because of bloated static executables. GNU Libtool does not work out of the box for building shared libraries on the BG/P -- more on that later.
  1. As expected, GNU Autotools-based packages cannot be simply built on BG/P. First, configure does not provide good cross-compilation support, at least not good enough for BG/P. I added these settings to the llnl_cross_compiling.m4 file (right after the case "$target" in line):
      powerpc64-ibm-bgp*)
        cross_compiling=yes
        llnl_cross_compiling_okay=yes
        enable_pure_static_runtime=no
        enable_shared=yes
        enable_static=no
        enable_java=no
        enable_python=/bgsys/drivers/ppcfloor/gnu-linux/bin/python
        llnl_cv_python_frontend=python
        llnl_cv_python_prefix=/bgsys/drivers/ppcfloor/gnu-linux
        llnl_cv_python_numpy=yes
        llnl_cv_python_numerical=no
        llnl_cv_python_library=$llnl_cv_python_prefix/lib
        llnl_cv_python_version=2.5
        llnl_cv_python_include=$llnl_cv_python_prefix/include/python$llnl_cv_python_version
        llnl_cv_extra_python_build_options="--compiler=mpixlc"
        llnl_python_shared_library=$llnl_cv_python_library/libpython2.5.so
        llnl_python_shared_library_found=yes
        sidl_cv_f77_false=0
        sidl_cv_f77_true=1
        sidl_cv_f90_false=0
        sidl_cv_f90_true=1
        llnl_cv_F77_logical_size=4
        llnl_cv_F90_logical_size=4
        ac_cv_f90_pointer_size=8
        llnl_cv_F77_string_passing="far int32_t"
        llnl_cv_F90_string_passing="far int32_t"
        ac_cv_func_malloc_0_nonnull=yes
        ac_cv_func_realloc_0_nonnull=yes
        ac_cv_func_memcmp_working=yes
        with_sidlx=no
    
        ;;
    

Some additional hard-coded configure settings are discussed in this IBM Document. (Note: A host triplet used for identifying compute nodes (used by IBM)seems to be powerpc-bgp-linux (not the one above).)

Next I changed the runtime/m4/llnl_confirm_babel_python.m4 file as follows:

*** llnl_confirm_babel_python_support.m4	2008-10-21 12:05:31.000000000 -0500
--- /home/norris/babel-1.4.0/runtime/m4/llnl_confirm_babel_python_support.m4	2009-02-24 22:38:39.246831463 -0600
***************
*** 38,47 ****
--- 38,49 ----
    fi
  
    if test "X$enable_python" != "Xno"; then
+     if test "X$cross_compiling" != "Xyes"; then 
        LLNL_PYTHON_LIBRARY
        LLNL_PYTHON_NUMERIC
        LLNL_PYTHON_SHARED_LIBRARY
        LLNL_PYTHON_AIX
+     fi
      if test \( "X$llnl_cv_python_numerical" != "Xyes" -a  "X$llnl_cv_python_numpy" != "Xyes" \) -o "X$enable_shared" = "Xno" -o "X$XML2_CONFIG" = "Xno"; then
         enable_python=no;
         AC_MSG_WARN([Configuration for Python failed.  Support for Python disabled!])

(NOTE from Tramer: This change was not necessary for the new BG/P cross-compilation, and in fact broke the build.)

Finally, I modified runtime/config/config.sub as follows:

*** config.sub	2008-10-21 11:42:24.000000000 -0500
--- /home/norris/babel-1.4.0.bk/runtime/config/config.sub	2009-02-23 13:23:35.376490273 -0600
***************
*** 1391,1396 ****
--- 1391,1399 ----
  	-bgl)
  		os=-bgl
  		;;
+ 	-bgp)
+ 		os=-bgp
+ 		;;
  	-catamount)
  		os=-catamount
  		;;

While building prerequisites (see next item), I found that libtool did not support building and linking shared libraries using IBM's compilers. The perfect fix for this problem is to add BG/P support to libtool. The faster workaround was to first set an environment variable:

export LDFLAGS='-G -qnostaticlink'

then run configure. Next we introduce an extra step before running make by using this little script on all generated libtool scripts in the build directory of the package being built

#!/bin/sh

#File: libtool_fix.sh

wd=`pwd`
if test "x$1" = "x" ; then prog=libtool
else prog=$1; fi

find $wd -name $prog -exec \
    sed -i1 -e 's|^export_dynamic_flag_spec=.*$|export_dynamic_flag_spec=""|g' \
        -e 's|^whole_archive_flag_spec=.$|whole_archive_flag_spec=""|g' \
	-e 's|^pic_flag=.*$|pic_flag=" -DPIC -qpic"|g' \
	-e 's|^archive_cmds=.*$|archive_cmds="\\$CC -G \\$libobjs \\$deplibs \\$compiler_flags -qmkshrobj -e \\$soname  -o \\$lib"|g' \
	-e 's|^hardcode_libdir_flag_spec=.*$|hardcode_libdir_flag_spec="-R\\$libdir -L\\$libdir"|g' {} \; -print

  1. To successfully configure Babel, the following prerequisites were necessary: libxml2 and zlib. The former was required for Python support (so the fact that Babel includes libparsifal does not remove the dependence on libxml2 if you want Python). The latter was required by libxml2.
    1. zlib-1.2.3 was configured with:
      ./configure --prefix=/home/norris/cca
      
      and then I had to manually change the values of the CC and LDSHARED variables since zlib does not have a real configure. The values were CC=mpixlc and LDSHARED=mpixlc (I think you only really need CC). You will also need to add the -qpic flag to CFLAGS.

    2. libxml2-2.7.3 was configured and built with
      ./configure --prefix=/home/norris/soft/libxml2-2.7.3 CC=mpixlc_r CXX=mpixlcxx_r F90=mpixlf2003_r F77=mpixlf77_r \
        CPP='mpixlc_r -E' --with-python=/bgsys/drivers/ppcfloor/gnu-linux/bin/python --with-zlib=/home/norris/soft/zlib-1.2.3 \
        --enable-shared --disable-static LDFLAGS='-G -qnostaticlink' --with-libs='-ldl'
      libtool_fix.sh
      # In config.h, make sure HAVE_DLOPEN is defined, and HAVE_SHLOAD is undefined
      make 
      make install
      

To test, do not use the libtool-generated wrappers scripts, but run the actual executables directly in the .libs subdirectory, e.g.:

cd .libs
qsub -A cca-tools -t 60 -n 1 -O testapi --env LD_LIBRARY_PATH=/gpfs/home/norris/soft/libxml2-2.7.3/lib ./testapi
qsub -A cca-tools -t 60 -n 1 -O runtest --env LD_LIBRARY_PATH=/gpfs/home/norris/soft/libxml2-2.7.3/lib ./runtest
  1. Miscellaneous:

To make sure that something is really compiled for the compute nodes, check it with

file executable 

The output should look something like this:

ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV), for GNU/Linux 2.0.0, dynamically linked (uses shared libs), for GNU/Linux 2.0.0, not stripped