- Home
- About
- Research
- Education
- News
- Publications
- Guides-new
- Guides
- Introduction to HPC clusters
- UNIX Introduction
- Nova
- HPC Class
- SCSLab
- File Transfers
- Cloud Back-up with Rclone
- Globus Connect
- Sample Job Scripts
- Containers
- Using DDT Parallel Debugger, MAP profiler and Performance Reports
- Using Matlab Parallel Server
- JupyterLab
- JupyterHub
- Using ANSYS RSM
- Nova OnDemand
- Python
- Using Julia
- LAS Machine Learning Container
- Support & Contacts
- Systems & Equipment
- FAQ: Frequently Asked Questions
- Contact Us
- Cluster Access Request
Job Accounting
Table of Contents
Summary
In order to ensure that all research groups get their fair share of the cluster and to account for differences in hardware being used, we utilize Slurm's built-in job accounting and fairshare system. On Nova each group is assigned a share that represents the group's investment in Nova. The Fairshare score of a group is calculated based off of their Share versus the amount of the cluster they have actually used. This Fairshare score is then utilized to assign priority to their jobs relative to other users on the cluster. This keeps individual groups from monopolizing the resources, thus making it unfair to those groups who have not used their fairshare for quite some time.
Usage Reports
Slurm's sacct command provides accounting data for all jobs and job steps. Refer to command's man page for more information (man sacct).
slurm-usage.py command generates CPU usage reports for the specified time frame. Issue "slurm-usage.py -h" to see available options and an example.
Monthly Cluster Usage Reports are placed in /work/<group_working_directory>/ClusterUsage
Multi-Factor Job Priority Plugin
On Nova we use Multi-factor Job Priority plugin.The FairShare algorithm calculates job's priority taking into account multiple factors such as job's age, size, partition, as well as FairShare factor. The following are the weights for these factors:
PriorityType=priority/multifactor
PriorityDecayHalfLife=30-0
PriorityWeightFairshare=100000
PriorityWeightAge=1000
PriorityWeightPartition=100000
PriorityWeightJobSize=10000
PriorityMaxAge=14-0
PriorityWeightQOS=1
Slurm FairShare factor is mainly based on the ratio of the amount of computing resources the user's jobs has already consumed to the shares of a computing resource that a user/group has been granted. The higher the value, the less shares were used compared to what was granted, and the higher is the placement in the queue.
Job priority can be checked with sprio command. sshare command lists groups' shares.
The following slide deck provides more details about how job priority is calculated: Slurm Priority, Fairshare and Fair Tree .