Contents


Deprecated: please see new documentation site.



by Brett Estrade

DRAFT!


Introduction

All compute nodes have more than one core/processor, so even if the job is not explicitly parallel (using OpenMP or MPI), it is still beneficial to be able to launch multiple jobs in a single submit script. This document will briefly explain how to launch multiple processes on a single computer node and among 2 or more compute nodes.

Note this method does NOT facilitate communication among processes, and is therefore not appropriate for use with parallelized executables using MPI; it is still okay to invoke multithreaded executables because they are invoked initially as a single process. The caveat there is one should not use more threads than there are cores on a single computer node.

Basic Use Case

A user has a serial executable that they wish to run multiple times on LONI/LSU HPC resources, but wishes to run many per submission script. Instead of submitting one queue script per serial execution, and wishing not to have any idle processors on one or more of the many-core compute nodes available, the user wishes to launch a serial process per available core.

Required Tools

  1. the shell (bash)
  2. ssh (for distributing processes to remote compute nodes)

Launching Multiple Processes - Basic Example

On a single computer node

This example assumes one knows of the number of available processors on a a single node. The following example is a bash shell script that uses a subshell to launch each process; additionally, the command in each subshell is backgrounded using the & symbol.

#!/bin/sh
 
# -- the following 8 lines issue a command, using a subshell, 
# into the background this creates 8 child processes, belonging 
# to the current shell script when executed

(/path/to/exe1 &) # executable/script arguments may be passed as expected 
(/path/to/exe2 &) 
(/path/to/exe3 &) 
(/path/to/exe4 &) 
(/path/to/exe5 &) 
(/path/to/exe6 &) 
(/path/to/exe7 &) 
(/path/to/exe8 &) 
 
# -- now WAIT for all 8 child processes to finish
# this will make sure that the parent process does not
# terminate, which is especially important in batch mode

wait

On multiple compute nodes

Distributing processes onto remote compute nodes builds upon the single node example. In the case of wanting to use multiple compute nodes, one can use the 8 subshells from above for the mother superior node (i.e., the "home" node, which will be the compute node that the batch schedular uses to execute the shell commands contained inside of the queue script). For the remote nodes, one must use a combination of the subshell, this time in conjunction with ssh to launch the command on the remote host.

#!/bin/sh
 
# -- the following 8 lines issue a command, using a subshell, 
# into the background this creates 8 child processes, belonging 
# to the current shell script when executed
 
# -- for mother superior, or "home", compute node
(/path/to/exe1 &) 
(/path/to/exe2 &) 
(/path/to/exe3 &) 
(/path/to/exe4 &) 
(/path/to/exe5 &) 
(/path/to/exe6 &) 
(/path/to/exe7 &) 
(/path/to/exe8 &) 

# -- for an additional, remote compute node
(ssh remotehost /path/to/exe1 &) 
(ssh remotehost /path/to/exe2 &) 
(ssh remotehost /path/to/exe3 &) 
(ssh remotehost /path/to/exe4 &) 
(ssh remotehost /path/to/exe5 &) 
(ssh remotehost /path/to/exe6 &) 
(ssh remotehost /path/to/exe7 &) 
(ssh remotehost /path/to/exe8 &) 
 
# -- now WAIT for all 8 child processes to finish
# this will make sure that the parent process does not
# terminate, which is especially important in batch mode

wait

The example above will spawn 16 subshells. 8 of the run on the local compute node (i.e., mother superior) and 8 run on remotehost. The background token (&) backgrounds the command on the local node (for all 16 commands); the commands sent to remotehost are not backgrounded remotely because this would not allow the local machine to know when the local command (i.e., ssh) completed.

Note as it is often the case that one does not know the identity of the mother superior node or the set of remote compute nodes (i.e., remotehost) when submitting such a script to the batch scheduler, some more programming must be done to determine the identity of these nodes at runtime. The following section considers this and concludes with a basic, adaptive example.

Advanced Example for PBS Queuing Systems

The steps involved in the following example are:

  1. determine identity of mother superior
  2. determine list of all remote compute nodes

Assumptions:

  1. shared file system
  2. hostnames (returned by hostname command) consists solely of the unqualified machine name (i.e., just the first part of the fully qualified name)
  3. a list of all compute nodes assigned by PBS are contained in a file referenced with the environmental variable, ${PBS_NODEFILE}
  4. each compute node has 8 processing cores

Note this example still requires that one knows the number of cores available per compute node; in this example, 8 is assumed.

#!/bin/sh
  
# -- the following 8 lines issue a command, using a subshell, 
# into the background this creates 8 child processes, belonging 
# to the current shell script when executed

# -- determine name of mother superior
 
MOTHER=`hostname`
 
# -- build list and determine count of all remote compute nodes

NODE=
count=0
for node in `cat $PBS_NODEFILE | uniq`; do
  # -- filter out mother superior node, capture everything else
  if [ $node != $MOTHER ]; then
    NODE[$count]=$node
    echo ${NODE[$count]}
    count=$(( $count + 1 ))
  fi
done
NODECOUNT=${count} # maintains count of number of remote hosts
   
# -- for mother superior, or "home", compute node (local invocation
# of executable is implicitly run on mother superior

(/path/to/exe1 &) 
(/path/to/exe2 &) 
(/path/to/exe3 &) 
(/path/to/exe4 &) 
(/path/to/exe5 &) 
(/path/to/exe6 &) 
(/path/to/exe7 &) 
(/path/to/exe8 &) 

# -- $NODE array is 0-indexed

N=$((${NODECOUNT}-1)) 

# -- loop of array; uses 'seq' command to produce ordered sequence
# of indexes: 0, 1, 2, 3, ..., $N

for i in `seq 0 ${N}`; do
  # -- update to use next remotehost

  $remotehost = ${NODE[$i]} 

  # -- invoke remote executables on $remotehost using ssh

  (ssh $remotehost /path/to/exe1 &) 
  (ssh $remotehost /path/to/exe2 &) 
  (ssh $remotehost /path/to/exe3 &) 
  (ssh $remotehost /path/to/exe4 &) 
  (ssh $remotehost /path/to/exe5 &) 
  (ssh $remotehost /path/to/exe6 &) 
  (ssh $remotehost /path/to/exe7 &) 
  (ssh $remotehost /path/to/exe8 &) 
done
  
# -- now WAIT for all 8*($NODECOUNT+1) child processes to finish
# this will make sure that the parent process does not
# terminate, which is especially important in batch mode

wait

Submitting to PBS

Now let's assume you have 128 tasks to run. You can do this by running on 16 8-core nodes using PBS. If one task takes 7 hours, and allowing a 30 minute safety margin, the following qsub command line will take care of running all 128 tasks:

% qsub -I -A allocation_account -V -l walltime=07:30:00,nodes=16:ppn=8 

When the script executes, it will find 16 node names in its PBS_NODEFILE, run 1 set of 8 tasks on the mother superior, and 15 sets of 8 on the remote nodes.

More information on submitting to the PBS queue can be accessed at the frequently asked questions page.

Advanced Usage Possibilities

1. executables in the above script can take normal argments and flags; e.g.:

(/path/to/exe1 arg1 arg2 -flag arg3 &) 

2. one can, technically, initiate multithreaded executables; the catch is to make sure there is only one thread per processing core is allowed; the following example launches 4, 2-threaded executables (presumably using OpenMP) locally - thus consuming all 8 (i.e., 4 processes * 2 threads/process), each.

# "_r" simply denotes that executable is multithreaded 
(OMP_NUM_THREADS=2 /path/to/exe1_r &) 
(OMP_NUM_THREADS=2 /path/to/exe2_r &) 
(OMP_NUM_THREADS=2 /path/to/exe3_r &) 
(OMP_NUM_THREADS=2 /path/to/exe4_r &)

# -- in total, 8 threads are being executed on the 8 (assumed) cores
# contained by this compute node; i.e., there are 4 processes, each 
# with 2 threads running.

A creative user can get even more complicated, but in general there is no need.

Conclusion

In situations where one wants to take advantage of all processors available on a single compute node or a set of compute nodes to run a number of unrelated processes, using subshells to manage backgrounded processes locally (and via ssh) remotely allows one to do this.

Using the examples above, one may create a customized solution using the proper techniques. For questions and comments regarding this topic and the examples above, please email sys-help@loni.org.


Powered by MediaWiki