Skip to content
Snippets Groups Projects
Forked from Hichem Ben Aoun / hpc_scripts
Up to date with the upstream repository.
user avatar
P Heller authored
104b0c2c
History
Name Last commit Last update
jobs
README.md

Usage

This repo serves as a template for job scripts. As a template was copied to be used jobs for a specific project, the template file should be customized to the project. The job_template.sh file is structured in three sections:

  1. SBATCH configuration
  2. Configuration variables
  3. Conda setup
  4. Job execution

SBATCH Configuration Comments

The SBATCH configuration comments allow configuration of the sbatch command and have to be at the top of a job.sh script used in the sbatch command. Having configuration in the job script itself avoids long cli commands as jobs are submitted. The configuration specified in the job scripts by the comments can be overwritten in the cli sbatch command. For example, the --time parameter can be overwritten by the command line with sbatch --time=00:30:00 job_template.sh. Now, the requested time is set to 30 minutes instead of the 3 hours specified in the job script.

#!/bin/bash
#SBATCH --time=03:00:00
#SBATCH --mem=256gb
#SBATCH --job-name=aiss_cv
#SBATCH --gres=gpu:2

Configuration Variables

The configuration variables is the second section of the job script. It is meant to allow easy customization of jobs without going through the entire file. There are a few variables that are already specified:

  1. ENV_NAME: name of the used conda environment (see Conda Setup)
  2. SOME_VARIABLE: an exemplary variable that can be used to easily customize the script
  3. AMOUNT_DEVICES: amount of GPU devices to use (specific to GPU usage)

The AMOUNT_DEVICES variable is tied to the SBATCH option of the amount of requested GPUs: --gres=gpu:2. If the amount of requested GPUs is changed, the AMOUNT_DEVICES variable has to be changed accordingly. The DEVICE_IDS variable is generated from the AMOUNT_DEVICES variable and can then be used in commands that require a list of device IDs.

```bash
# ...
export ENV_NAME="my_env_name"
# ...
SOME_VARIABLE=can_be_used_here

# amount of GPU devices to use (specific to GPU usage)
AMOUNT_DEVICES=2
# Generate device IDs list
DEVICE_IDS=$(seq -s "," 0 $((AMOUNT_DEVICES-1)))

All variables are technically optional and are meant to help keep a tight overview of the job script. If a variable is not used or cumbersome to use in a specific environment, adapt the script accordingly.

Conda Setup

This job script helps with environment control. It loads and if necessary creates the conda environment. The environment name is specified in the ENV_NAME variable (see Configuration Variables).

If no conda environment is specified, the "base" environment will be loaded.

To configure a conda environment, either create and install dependencies to an environment beforehand (it will then only be loaded) or provide the name to a new environment in the ENV_NAME variable. The environment will then be created and dependencies will be installed. In the latter case, the dependency installation instructions have to be specified in jobs\conda_config\handle_conda_activation.sh (see comment # Instructions for installing dependencies here in the handle_conda_activation.sh).

Job Execution

This is the last section of the job script. It is meant to be used for the actual job execution. The SOME_VARIABLE variable can be used here as an example. The DEVICE_IDS variable can be used to specify the device IDs to use for the job.

# would be executed as:
# python some_script.py --some_parameter can_be_used_here \
# --device_ids 0,1

python some_script.py --some_parameter $SOME_VARIABLE \
--device_ids $DEVICE_IDS