Contents


Deprecated: please see new documentation site.



An interactive session is a set of compute nodes which allow one to manually interact (ala shell, etc) with your programs while taking advantage of dedicated multiple processors/nodes. This is useful for development, debugging, running long sequential jobs, and testing. The following is meant to be a quick guide on how to achieve such a session on various LSU/LONI resources:

Note 1: these methods should work for all the Linux clusters on LONI/LSU, but the host names (e.g., tezpur.hpc.lsu.edu is used as the host name in the following) will need to reflect the machine that is being used. This is also the case with the ppn= (processors per node) keyword value (e.g., QueenBee would be ppn=8).

Note 2: the commands below conform to the bash shell syntax. Your mileage may differ if you use a different shell.

Note 3: this method will require opening 2 terminal windows.

Interactive Method

1. In the terminal 1 window, login to the head node of the desired x86 Linux cluster:

 ssh -XY username@tezpur.hpc.lsu.edu

2. Once logged onto the head node, the next step is to reserve a set of nodes for interactive use. This is done by issuing a qsub command similar to the following:

 $ qsub -I -A allocation_account -V -l walltime=HH:MM:SS,nodes=NUM_NODEs:ppn=4 
  • HH:MM:SS - length of time you wish to use the nodes (resource availability applies as usual).
  • NUM_NODEs - the number of nodes you wish to have.
  • ppn - must match the number of cores available per node (system dependent).

You will likely have to wait a bit to get a node, and you will see a "waiting for job to start message" in the mean time. Once a prompt appears, the job has started.

3. After the job has started, the next step is to determine which nodes have been reserved for you. To do this, examine the contents of the node list file as set for you in the PBS_NODEFILE environment variable by the PBS system. One way to do this, and an example result, is:

 $ printenv PBS_NODEFILE
 /var/spool/torque/aux/xyz.tezpur2 

xyz is some number of digits representing the job number on tezpur.

4. Your terminal 1 session is now connected to the rank 0, or primary, compute node. You should now determine its host name:

 $ hostname
 tezpurIJK

Where IJK is a 3 digit number.

5. To actually begin using the node, repeat step 1 in a second terminal, terminal 2. Once logged onto the head node, connect from there to the node determined in step 4. The two steps would look like:

 On you client:  $ ssh -XY username@tezpur.hpc.lsu.edu
 On the headnode: $ ssh -XY tezpurIJK

You have two ways to approach the rest of this process, depending on which terminal window you want to enter commands in.

Using Terminal 2

6. In terminal 2 set the environmental variable, PBS_NODEFILE, to match what you found in step 3:

 $ export PBS_NODEFILE=/var/spool/torque/aux/xyz.tezpur2

7. Now you are set to run any programs you wish, using terminal 2 for your interactive session. All X11 windows will be forwarded from the main compute node to your client PC for viewing.

8. The "terminal 2" session can be terminated and re-established, as needed, so long as the PBS job is still running. Once the PBS job runs out of time, or the "terminal 1" session exits, the reserved nodes will be released, and the process must be repeated from step 1 to start another session.

Using Terminal 1

6. In terminal 2, determine the value of the environmental variable, DISPLAY as so:

 $ printenv DISPLAY
 localhost:IJ.0

Here IJ is some set of digits.

7. Now in terminal 1, set the environmental variable, DISPLAY to match:

 $ export DISPLAY=localhost:IJ.0

8. At this point, use terminal 1 for your interactive session commands; all X11 windows will be forwarded from the main compute node to the client PC.

9. The "terminal 2" session can be terminated and re-established, as needed, so long as the PBS job is still running. Once the PBS job runs out of time, or the "terminal 1" session exits, the reserved nodes will be released, and the process must be repeated from step 1 to start another session.

The Batch Method

Sometimes an interactive session is not sufficient. In this case, it is possible to latch on to a batch job submitted in the traditional sense. This example shows how to reserve a set of nodes via the batch scheduler. Interactive access to the machine, with a properly set environment, is accomplished by taking the following steps.

Note: this method only requires 1 terminal.

1. Login to the head node of the desired x86 Linux cluster:

 $ ssh -XY username@tezpur.hpc.lsu.edu

2. Once on the head node, create a job script, calling it something like interactive.pbs, containing the following. This is a job that simply sleeps and wakes to spin time:

#!/bin/sh
#PBS -A allocation_account
echo "Changing to directory from which script was submitted."
cd $PBS_O_WORKDIR
# create bash/sh environment source file
H=`hostname`
# -- add host name as top line
echo "# main node: $H"  > ${PBS_JOBID}.env.sh
# -- dump raw env
env | grep PBS         >> ${PBS_JOBID}.env.sh
# -- cp raw to be used for csh/tcsh resource file
cp ${PBS_JOBID}.env.sh ${PBS_JOBID}.env.csh
# -- convert *.sh to sh/bash resource file
perl -pi -e 's/^PBS/export PBS/g' ${PBS_JOBID}.env.sh
# -- convert *.csh to csh/tcsh resource file
perl -pi -e 's/^PBS/setenv PBS/g' ${PBS_JOBID}.env.csh
perl -pi -e 's/=/ /g' ${PBS_JOBID}.env.csh
# -- entering into idle loop to keep job alive
while [ 1 ]; do
  sleep 10 # in seconds
  echo hi... > /dev/null
done

3. Submit the script saved in step #2:

 $ qsub -V -l walltime=00:30:00,nodes=1:ppn=4 interactive.pbs

4. You can check for when the job starts using qstat, and when it does, the following happens:

  • 2 files are created in the current directory that contain the required environmental variables:
    • <jobid>.env.sh
    • <jobid>.env.csh
  • the job is kept alive by the idle while loop

5. Determine the main compute node being used by the job by inspecting the top line of either of the 2 environment files

 $ % head -n 1 <jobid>.env.sh
 # main node: tezpurIJK

Where IJK is some set of digits.

6. Login to the host specified in step 5; and be sure to note the directory from which the job was submitted:

 $ ssh -XY tezpurIJK

7. Source the proper shell environment

 $ . /path/to/<jobid>.env.sh

8. Ensure that all the PBS_* environment variables are set. For example:

 $ env | grep PBS 
 PBS_JOBNAME=dumpenv.pbs
 PBS_ENVIRONMENT=PBS_BATCH
 PBS_O_WORKDIR=/home/estrabd/xterm
 PBS_TASKNUM=1
 PBS_O_HOME=/home/estrabd
 PBS_MOMPORT=15003
 PBS_O_QUEUE=workq
 PBS_O_LOGNAME=estrabd
 PBS_O_LANG=en_US.UTF-8
 PBS_JOBCOOKIE=B413DC38832A165BA0E8C5D2EC572F05
 PBS_NODENUM=0
 PBS_O_SHELL=/bin/bash
 PBS_JOBID=9771.tezpur2
 PBS_O_HOST=tezpur2
 PBS_VNODENUM=0
 PBS_QUEUE=workq
 PBS_O_MAIL=/var/spool/mail/estrabd
 PBS_NODEFILE=/var/spool/torque/aux//9771.tezpur2
 PBS_O_PATH=... # not shown due to length

9. Now this terminal can be used for interactive commands; all X11 windows will be forwarded from the main compute node to the client PC

Notes and Links

The methods outlined above are particularly useful with the debugging tutorial, Using TotalView on x86 Linux Clusters.


Users may direct questions to sys-help@loni.org.

Powered by MediaWiki