First Submission

If you are new to the HPC Vega clusters or Slurm, see this tutorial for running your first job. Read the File Management to know where to write the output files of your jobs.

The simplest way to start a job is by using the srun command.

[user@login0004 ~]$ srun -N 2 -n 4 hostname
cn0321
cn0321
cn0320
cn0320

In the following example, when starting a hostname job, two nodes are required, each with ten tasks per node, two CPUs per task (40 CPUs in total), 1 GB of memory on a partition named express for one hour:

 srun --partition=cpu --nodes=2  --ntasks 10 --cpus-per-task 2 \
 --time=00:00:30 --mem=1000MB hostname

The weakness of the srun command is that it blocks our command line while our job is not closing. In addition, it makes it awkward to run more complex ones with multiple settings. In such cases, we prefer to use the sbatch command, sending the settings to individual tasks within our job, writing them to the bash script file.

#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --partition=cpu
#SBATCH --ntasks=4
#SBATCH --nodes=1
#SBATCH --mem-per-cpu=100MB
#SBATCH --output=my_job.out
#SBATCH --time=00:01:00

srun hostname

This is a example of batch script. At the top of the script we have a comment #!/bin/bash that tells the command line that it is a bash script file. This is followed by the line-by-line settings of our job, which always have the prefix #SBATCH. We determine the reservation, the number of tasks and the number of nodes for our job (--reservation, --ntasks and --nodes) with the following parameters.

  • --job-name=my_ job_name: the name of the job that is displayed when we make a query using the squeue command.

  • --partition=cpu: the partition within which we want to run our job.

  • --mem-per-cp =100MB: amount of system memory required by our job for each task (looking at the processor core).

  • --output=my_job.out: the name of the file in which the content that our job would print to standard output (screen) is written.

  • --time=00:01:00: time limit of our job in hour:minute:second format.

Next step is to run created job.

$sbatch ./my_job.sh
Submitted batch job 287999

The output file my_job.out is created.

$cat ./my_job.out
cn0321
cn0321
cn0321
cn0321