Slurm Interactive Jobs¶
Interactive jobs are an effective way to debug and troubleshoot workload steps. Opposed to batch jobs sbatch
which run unnatended and provide aggregated file based error and output messages, interactive jobs salloc
srun
allow you to access compute resources with a desired allocation and walk through job steps to identify problems or optimizations.
This is especially important for new or experimental workloads, where using sbatch
typically involves an inefficient process of running all jobs steps, waiting for results, modifying input, and repeating until all problems are resolved.
Interactive job enable these processes to be addressed in a single step and provide more control and visibility into workload behavior.
Our recommended steps for troubleshooting or preparing new workloads are:
Request or aquires software and dependencies
Perform data transfer, staging, and lightweight testing on the login node to verify basic functionality.
Run an interactive job usin
Execute any known environment modifications andor commands required for your workload software components
Attempt to identify and correct errors or unexpected behaviors and note all additional steps taken
Iterate step 5, until the software is generating expected results
Create a job script that includes all steps used to correct errors and\or behaviors, in the appropriate order
Exit the interactive session, and confirm that the job script executes in batch
Make scientific discoveries
Interactive Jobs Using salloc
¶
salloc
is the preferred utility for interactive jobs.
To request an interactivce job allocation using salloc
, modify andor append the following generalized syntax with your desired resource allocation:
salloc -N1 -n16
Upon success, Slurm will allocate resources according to your parameters and create a shell session on the root compute node, from which you can begin validating your job steps …
salloc: Pending job allocation 588384
salloc: job 588384 queued and waiting for resources
salloc: job 588384 has been allocated resources
salloc: Granted job allocation 588384
[hpcuser@node123 ~]$
Interactive Jobs Using srun
¶
srun
is a legacy approach which has recently been deprecated.
For convenience, interactive jobs using srun
are still supported but as of 03.31.23 an environment change is required to use srun for interactive jobs …
module load slurm/auhpc
srun -N1 -n16 [optional parameters] --pty /bin/bash
The first step changes your environment to use an intermediary script which evaluates and determines the requested job type(s) (e.g. interactive, or direct execution) and appends any additional parameters required for the determined type.
To disable this behavior, you can unload the environment module and return to the standard srun utility by loading the default slurm module.
module load slurm/auhpc
which srun
/tools/scripts/auhpc/srun
module load slurm
which srun
/cm/shared/apps/slurm/current/bin/srun