Option:
| Description:
|
---|---|
#SBATCH -J jobname | Assign the name of the job. |
#SBATCH -p big | Assign the partition (queue) for the job. |
#SBATCH -w nodelist | Request particular nodes. The list can be comma-separated or range, i.e. node[1-3] |
#SBATCH -N num | Request num of nodes for the job. |
#SBATCH -n num | Request num of processors for the job. |
#SBATCH -t d-hh:mm:ss | Request walltime for the job. |
#SBATCH -D dirname | Set dirname as the working directory. Relative to cwd, or absolute. |
#SBATCH -o file.%j.out | Assign file.%j.out as stdout for the job %j. |
#SBATCH -e file.%j.out | Assign file.%j.out as stderr for the job %j. |
#SBATCH --mail-type=BEGIN,END,FAIL | Notify by mail on job start, completion or failure. |
#SBATCH --mail-user=your.email@somewhere.net | The recepient of email notifications. |
run.sh
) would be:
#!/bin/bash #SBATCH -J test_job #SBATCH -p big #SBATCH -N 1 #SBATCH -n 32 #SBATCH -t 0-00:01:00 #SBATCH --mail-type=END,FAIL #SBATCH --mail-user=me@somewhere.net mpirun python my_mpi_job.py
sbatch run.sh
Command:
| Description:
|
---|---|
sinfo | Provides information on cluster state, grouped by partitions |
sinfo -leN | Provides a bit more information on cluster state, by machine type |
squeue | List all jobs in the queue |
smap | Curses frontend to jobs, partitions and configuration |
sview | Graphical frontend to jobs, partitions and configuration |
Node:
| Partition:
| Cores:
| Characteristics:
|
---|---|---|---|
mainframe | --- | 64 | Reserved for administrative use |
node1 | big | 72 | Intel E5-2695v4 |
node2 | big | 72 | Intel E5-2695v4 |
node3 | big | 72 | Intel E5-2695v4 |
node4 | big | 72 | Intel E5-2695v4 |
node5 | big | 72 | Intel E5-2695v4 |
node6 | big | 72 | Intel E5-2695v4 |
node7 | big | 72 | Intel E5-2695v4 |
node8 | big | 72 | Intel E5-2695v4 |
node9 | big | 64 | AMD Opteron 6376 |
node10 | big | 48 | AMD Opteron 6344 |
node11 | big | 48 | AMD Opteron 6344 |
node12 | big | 48 | AMD Opteron 6344 |
node13 | big | 48 | AMD Opteron 6344 |
Q: | Running a process on a dedicated node is far more efficient than across nodes. Can I request a whole node to myself even though I might not need all the processors available on that node? |
A: | Yes, pass the --exclusive switch to sbatch/srun, or include #SBATCH --exclusive in the startup script. |
Q: | My process is running a long time, and I would like occasional diagnostics to be printed out. It works, but it outputs my diagnostics in the end instead of as it goes along. |
A: | This is because of the I/O buffering. Prefix your script with stdbuf -oL -eL for line buffering, or stdbuf -o0 -e0 for character buffering. |
Q: | This is so awesome! What can I do to make it up to you? |
A: | I accept Swiss chocolate, Belgian beer and Californian full-bodied red wine donations. ;) |