SLURM

Since there may be many users simulteniously logged into cluster headnode, it's important not to run intensive tasks on the headnode. Such tasks should be performed on compute nodes.

The headnode should be used to edit files and to submit jobs to a workload manager which schedules jobs to run on compute nodes. At ISU we use Slurm Workload Manager.

Jobs can be run in interactive and batch modes. In interactive mode you will be logged (via a Slurm command) onto a compute node. Slurm will allocate requested resources only to your interactive job. You can request as little as 1 core, or multiple nodes, depending on what is needed by your job. Clearly, the more resources (cores, memory, CPU time, etc.) you request the longer you may need to wait for those recourses to become available. Since other jobs won't have access to the resources allocated to your job, it will look like a personal computer/cluster. Your environment, such as loaded environment modules, will be copied to the interactive session. Program/command output will be printed on the screen. Interactive mode should be used mostly to debug jobs. The downside of interactive mode: 1. requested resources may not be available right away, and users will have to wait for those before they will be able to run interactive commands 2. if connection to the cluster is lost, the interactive job will be killed by Slurm.

For production jobs the batch mode should be used. In batch mode the resource requests and list of commands to be executed on compute nodes are placed in a job script, which is submitted to Slurm with sbatch command. Program/command output will be redirected to a file. To make it easier for users to create job scripts, we provide job script generators for each cluster -  see the appropriate cluster User Guide listed on the left.

When an interactive or batch job is submitted to Slurm, the workload manager places job in the queue. Each job is given a priority which may change during the time the job stays in the queue. We use Slurm's fair-share scheduling on clusters at ISU. Job priority depends on how much resources had been used by the user or user's group, group's contribution to the cluster and how long the job has been waiting for resources. In accordance with job priority and amount of resources requested versus available, Slurm decides which resources to allocate to jobs. 

On research clusters you can use slurm-usage.py command to see your group usage. To see available options issue "slurm-usage.py -h" command.

For more details on managing jobs using Slurm refer to the specific cluster User Guide listed on the left.

 

Next: Linux Commands in more detail