Do not request more resources (CPUs, memory, GPUs) than needed. In addition to using your core hours faster, resources intensive jobs will take longer to queue. Use the information provided at the completion of your job (e.g. via the sacct command) to better define resource requirements.
Specify both a preferred maximum time limit and a minimum time limit as well if your workflow performs self-checkpointing. In this example, if you know that your job will save its intermediate results within the first 2 hours, these specifications will cause Slurm to schedule your job at the earliest available time window of 2 hours or longer, up to 48hrs:
#SBATCH --time=12:00:00 #SBATCH --time-min=02:00:00
Specify memory requirements explicitly, either as memory per node, or as memory per CPU:
#SBATCH --mem= or #SBATCH --mem-per-cpu=
Specify a range of acceptable node counts. This example tells the scheduler that the job can use anywhere from cn0200 to cn0300 nodes: (NOTE: Your job script must then launch the appropriate number of tasks, based on how many nodes are actually allocated to you.)
Specify the minimum number of CPUs per node that your application requires. With this example, your application will run on any available node with 16 or more cores available:
To flexibly request large memory nodes, you could specify a node range, maximum number of tasks (if you receive the maximum node count you request), and total memory needed per node. For example, for an application that can run on anywhere from 20 to 24 nodes, needs 8 cores per node, and uses 2G per core, you could specify the following:
#SBATCH --nodes=cn0030-cn0034 #SBATCH --ntasks-per-node=8 #SBATCH --mem=16G
In the above, Slurm understands
--ntasks to be the maximum task count across all nodes. So your application will need to be able to run on 160, 168, 176, 184, or 192 cores and will need to launch the appropriate number of tasks, based on how many nodes you are actually allocated.