Note: This page has been marked as "Obselete" by the site administrator.

Contents

MPIg Availability

Currently, mpiG is available on all IBM p575 clusters, namely Bluedawg, Ducky, Zeke, Neptune and Lacumba.

Setting up environment

First of all, users need to add "+globus-4.0.7" to their .soft file and resoft. For details on how to use the Softenv utility to manage environment, please refer to User_Environment.

Users also need to source the $GLOBUS_LOCATION/etc/globus-user-env.sh script, preferably in .profile or .bashrc:

. $GLOBUS_LOCATION/etc/globus-user-env.sh

Running pre-WS GRAM jobs

To run pre-WS GRAM jobs, following the following steps:

  1. apply for a LONI CA user cert (if you have already done this, skip this step):
    1. locate GLOBUS_LOCATION (the path to Globus package) on any of the LONI machines, run $GLOBUS_LOCATION/bin/grid-cert-request, this will generate three files under .globus in your home: usercert_request.pem, usercert.pem (empty), userkey.pem.
    2. email usercert_request.pem to ca@loni.org,
    3. after receiving the signed certificate from ca@loni.org, copy it to usercert.pem under .globus,
  2. compile your code with compilers (mpicc, mpif90, etc.) from 64-bit mpiG which is under /usr/local/packages/mpig-1.0.6-64.
  3. start Globus proxy by "grid-proxy-init".
  4. Jobs can be submitted through Globus RSL files. Currently, users need to first peek at the queues on both machines to see available procs, then adapt procs request in rsl file, then run "globusrun -f RSLfilename" to launch your job.

Keep in mind that a process running on one machine only has access to the local disk on that machine, you will need to adapt your I/O method when running on two machines simultaneously because data will be written to disks geographically separated to each other.

A sample RSL file requesting 16 procs on Zeke and 16 procs on Bluedawg:

+
( & (queue= checkpt)
(resourceManagerContact="l1f1n01.sys.loni.org/jobmanager-loadleveler")
(job_type = multiple)
(project = loni_allocation_name)
(count= 16)
(host_count=2)
(minMemory = 50)
(maxWallTime = 3000)
(directory = /work/ou/flower/run)
(environment=
(GLOBUS_DUROC_SUBJOB_INDEX 0)
(GBLL_NETWORK_MPI sn_single,not_shared,US,HIGH)
(LD_LIBRARY_PATH
/usr/local/globus/globus.4.0.4/lib/:/usr/local/packages/mpig-1.0.6-64/lib)
(PATH
/usr/local/globus/globus-4.0.4/bin/:/usr/local/packages/mpig-1.0.6-64/bin:.)
)
(executable = /work/ou/flower/run/hydro)
(stderr=/work/ou/flower/run/std.err)
(stdout=/work/ou/flower/run/std.out)
)
(
&
(resourceManagerContact="l3f1n01.sys.loni.org/jobmanager-loadleveler")
(project = grid_flower_test)
(queue= checkpt)
(job_type = multiple)
(count= 16)
(host_count=2)
(minMemory = 50)
(maxWallTime = 3000)
(directory = /work/ou/flower/run)
(environment=
(GLOBUS_DUROC_SUBJOB_INDEX 1)
(GBLL_NETWORK_MPI sn_single,not_shared,US,HIGH)
(LD_LIBRARY_PATH
/usr/local/globus-4.0.4/lib/:/usr/local/packages/mpig-1.0.6-64/lib)
(PATH /usr/local/globus-4.0.4/bin/:/usr/local/packages/mpig-1.0.6-64/bin:.) )
(executable = /work/ou/flower/run/hydro)
(stderr=/work/ou/flower/run/std.err)
(stdout=/work/ou/flower/run/std.out)
)


Here, project is your LONI allocation name, queue is the queue name on LONI machines, count is number of processors needed by your job, host_count is number of nodes, minMemory is the minimum memory in Megabytes required by your job, maxWalltime is max wall clock time in minutes, you can define environment variables such as LD_LIBRARY_PATH in the same way shown in above.

Running WS-GRAM jobs

To run MPIg jobs in WS-GRAM, use MPIg package under /usr/local/packages/mpig-1.0.6-64. But first of all you need to make the following changes to your environments.

So, here are the steps users need to do in addition to usual steps of running pre-WS MPIg jobs:

(1) export X509_USER_PROXY=/home/$USER/.globus/userproxy.pem
(2) rerun grid-proxy-init to generate the new proxy file in /home/$USER/.globus/userproxy.pem
(3) compile the code with the new version of MPIg (mpig-1.0.6-64)
(4) add environment variable X509_USER_PROXY=/home/$USER/.globus/userproxy.pem in you job.jdd file
(5) globusrun-ws -submit -J -f job.jdd

The new changes are needed since MPIg jobs need to access your PROXY files, which normally reside under /tmp. However, our Globus server is running just on the headnode, its /tmp is not accessible for computing nodes. The turnaround is to redefine X509_USER_PROXY to be located in your home directory, which is accessible to all the computing nodes. The -J option is supposed to handle the delegation of your proxy to all of the remote MPIg sites, otherwise, you have to start your proxy at all the sites so that each of your home directories in these sites contains a PROXY files that is accessible to local computing nodes.

The following is a sample .jdd file for running jobs cross Zeke and Bluedawg


<?xml version="1.0"?>
<multiJob>
 <factoryEndpoint xmlns:gram="http://www.globus.org/namespaces/2004/10/gram/job" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing">
   <wsa:Address>
       https://bluedawg.loni.org:8443/wsrf/services/ManagedJobFactoryService
      </wsa:Address>
   <wsa:ReferenceProperties>
     <gram:ResourceID>Multi</gram:ResourceID>
   </wsa:ReferenceProperties>
 </factoryEndpoint>
 <jobType>multiple</jobType>
 <job>
   <factoryEndpoint xmlns:gram="http://www.globus.org/namespaces/2004/10/gram/job" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing">
     <wsa:Address>
       https://zeke.loni.org:8443/wsrf/services/ManagedJobFactoryService
          </wsa:Address>
     <wsa:ReferenceProperties>
       <gram:ResourceID>Loadleveler</gram:ResourceID>
     </wsa:ReferenceProperties>
   </factoryEndpoint>
   <executable>/work/default/ou/pingpong_mpig/pingpong</executable>
   <environment>
     <name>GLOBUS_DUROC_SUBJOB_INDEX</name>
     <value>0</value>
   </environment>
   <environment>
     <name>MP_BUFFER_MEM</name>
     <value>32m</value>
   </environment>
   <environment>
    <name>GBLL_NETWORK_MPI</name>
     <value>sn_single,not_shared,US,HIGH</value>
   </environment>
   <environment>
     <name>LD_LIBRARY_PATH</name>
     <value>/usr/local/globus/globus.4.0.4/lib/:/usr/local/packages/mpig-1.0.6-64/lib</value>
   </environment>
   <environment>
     <name>PATH</name>
     <value>/usr/local/globus/globus-4.0.4/bin/:/usr/local/packages/mpig-1.0.6-64/bin:.</value>
   </environment>
   <environment>
     <name>X509_USER_PROXY</name>
     <value>/home/ou/.globus/userproxy.pem</value>
   </environment>
   <stdout>/work/default/ou/pingpong_mpig/std.out</stdout>
   <stderr>/work/default/ou/pingpong_mpig/std.err</stderr>
   <count>1</count>
   <hostCount>1</hostCount>
   <project>loni_ellipsd01</project>
   <queue>checkpt</queue>
   <maxTime>15</maxTime>
   <jobType>mpi</jobType>
 </job>
 <job>
   <factoryEndpoint xmlns:gram="http://www.globus.org/namespaces/2004/10/gram/job" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing">
     <wsa:Address>
       https://bluedawg.loni.org:8443/wsrf/services/ManagedJobFactoryService
          </wsa:Address>
         <wsa:ReferenceProperties>
       <gram:ResourceID>Loadleveler</gram:ResourceID>
     </wsa:ReferenceProperties>
   </factoryEndpoint>
   <executable>/work/default/ou/pingpong_mpig/pingpong</executable>
   <environment>
     <name>GLOBUS_DUROC_SUBJOB_INDEX</name>
     <value>1</value>
   </environment>
   <environment>
     <name>MP_BUFFER_MEM</name>
     <value>32m</value>
   </environment>
   <environment>
     <name>X509_USER_PROXY</name>
     <value>/home/ou/.globus/userproxy.pem</value>
   </environment>
   <environment>
    <name>GBLL_NETWORK_MPI</name>
     <value>sn_single,not_shared,US,HIGH</value>
   </environment>
   <environment>
     <name>LD_LIBRARY_PATH</name>
     <value>/usr/local/globus/globus.4.0.4/lib/:/usr/local/packages/mpig-1.0.6-64/lib</value>
   </environment>
   <environment>
     <name>PATH</name>
     <value>/usr/local/globus/globus-4.0.4/bin/:/usr/local/packages/mpig-1.0.6-64/bin:.</value>
   </environment>
   <stdout>/work/default/ou/pingpong_mpig/std.out</stdout>
   <stderr>/work/default/ou/pingpong_mpig/std.err</stderr>
   <count>1</count>
   <hostCount>1</hostCount>
   <project>loni_ellipsd01</project>
   <queue>checkpt</queue>
   <maxTime>15</maxTime>
   <jobType>mpi</jobType>
 </job>
</multiJob>


Final note

with the -J option, users do not need to start proxy at every site.


For further information on the integration of GT 4.x and LoadLeveler, see the IBM publication LoadLeveler GT 4.0 Users Guide.

Powered by MediaWiki