Partition Configuration and Limits

The scheduler on Nova is Slurm Resource Manager. To see current partition configuration, issue:

scontrol show partitions

 

At some point the following partitions have been configured:

Partition Name Max Time/Job (hours) Max Nodes per job Number of Nodes in partition (may change)  
debug211
short_1node1921167
long_1node192504167
short_medium1921842
long_medium192336833
short_large192148*25
long_large192 336 48* 25
short_1node3841131
long_1node384 504 1 31
short_medium384 1 8 35
long_medium384 336 8 26
short_large384 1 48* 18
long_large384 336 48* 18
gpu192 24   2
gpu38424 1
huge24 1

* Even though the maximum node count per job in large partitions is set to 48, the number of nodes in those partitions can be a limiting factor.

 

Besides partition limits, each group is limited to maximum 8 times of the resources purchased by the group (but no more than half the cluster) across all running jobs. For example, if a group purchased one 384GB memory node, maximum 288 cores and 3TB of memory will be available to all group's jobs. To see those limits, issue:

sacctmgr show qos format=name,GrpTRES%30