cd
: Navigate Directoriescd [directory]
– Changes the current working directory to the specified directory.cd ..
– Move up one directory level.cd /path/to/directory
– Go to a specific path.cd ~
or cd
– Go to the home directory.cd /
– Go to the root directory.ls
: List Directory Contentsls
– Lists files and directories in the current working directory.ls -l
– Displays detailed information about each file (permissions, owner, size, etc.).ls -a
– Shows all files, including hidden files (files starting with a dot).ls -h
– Displays sizes in human-readable format (e.g., KB, MB).ls -lah
– Combines the above options.mkdir
: Make Directorymkdir [directory_name]
– Creates a new directory with the specified name.mkdir -p /path/to/dir
to create nested directories.rmdir
and rm
: Remove Directories and Filesrm [file]
– Removes a file.rmdir [directory]
– Removes an empty directory.rm -r [directory]
- Removes a directory and its contents recursively.rm -f [file]
- Removes to force remove a file (no undo be careful!).mv
and cp
: Move, Rename and Copycp [source] [destination]
– Copies a file or directory to the destination.mv [source] [destination]
– Moves a file or directory to the destination.mv
can also be used to rename a file or directory if the source and destination directories match.echo [text]
– Displays text or outputs text to a file.echo $ENV_VAR
– Displays the value of an environment variable.touch [filename]
– Creates an empty file or updates the timestamp of an existing file.>
– Redirects output to a file.>>
– Appends output to a file.watch
: Monitor Command Outputwatch [command]
– Repeatedly runs a command at intervals and displays the result.watch -n [seconds] [command]
to change the interval.man [command]
– Opens the manual page for a command.help [command]
– Provides a short description of built-in commands.|
: Pipes!Description:
|
– Pipes the output of one command as input to another.(Correct at time of writing)
Log into bastion node (not necessary within SWC network)
This node is fine for light work, but no intensive analyses
More details
See our guide at howto.neuroinformatics.dev
/nfs/nhome/live/<USERNAME>
or /nfs/ghome/live/<USERNAME>
~/
/nfs/winstor/<group>
- Old SWC research data storage/nfs/gatsbystor
- GCNU data storage/ceph/<group>
- Current research data storage/ceph/scratch
- Not backed up, for short-term storage/ceph/apps
- HPC applicationsNote
You may only be able to “see” a drive if you navigate to it
Navigate to the scratch space
All nodes have the same software installed
Preinstalled packages available for use, including:
List available modules
View a summary of the available resources
atyson@hpc-gw1:~$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
cpu* up 10-00:00:0 1 mix# gpu-380-25
cpu* up 10-00:00:0 31 mix enc1-node[1-14],enc2-node[1-13],enc3-node[6-8],gpu-380-24
cpu* up 10-00:00:0 4 alloc enc3-node[1-2,4-5]
gpu up 10-00:00:0 1 mix# gpu-380-15
gpu up 10-00:00:0 1 down~ gpu-380-16
gpu up 10-00:00:0 12 mix gpu-350-[01-05], gpu-380-[11,13-14,17-18],gpu-sr670-[20,22]
a100 up 30-00:00:0 2 mix gpu-sr670-[21,23]
lmem up 10-00:00:0 1 idle~ gpu-380-12
medium up 12:00:00 1 mix# gpu-380-15
medium up 12:00:00 1 down~ gpu-380-16
medium up 12:00:00 7 mix enc3-node[6-8],gpu-380-[11,14,17-18]
medium up 12:00:00 4 alloc enc3-node[1-2,4-5]
fast up 3:00:00 2 idle~ enc1-node16,gpu-erlich01
fast up 3:00:00 4 mix gpu-380-[11,14,17-18]
View currently running jobs (from everyone)
atyson@hpc-gw1:~$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
4036257 cpu bash imansd R 13-01:10:01 1 enc1-node2
4050946 cpu zsh apezzott R 1-01:02:30 1 enc2-node11
3921466 cpu bash imansd R 51-03:05:29 1 gpu-380-13
4037613 gpu bash pierreg R 12-05:55:06 1 gpu-sr670-20
4051306 gpu ddpm-vae jheald R 15:49 1 gpu-350-01
4051294 gpu jupyter samoh R 1:40:59 1 gpu-sr670-22
4047787 gpu bash antonins R 4-18:59:43 1 gpu-sr670-21
4051063_7 gpu LRsem apezzott R 1-00:08:32 1 gpu-350-05
4051063_8 gpu LRsem apezzott R 1-00:08:32 1 gpu-380-10
4051305 gpu bash kjensen R 18:33 1 gpu-sr670-20
4051297 gpu bash slenzi R 1:15:39 1 gpu-350-01
More details
See our guide at howto.neuroinformatics.dev
Start an interactive job (bash -i
) in the fast partition (-p fast
) in pseudoterminal mode (--pty
) with one CPU core (-n 1
).
Always start a job (interactive or batch) before doing anything intensive to spare the gateway node.
Clone a test script
Check out batch script:
Run batch job:
Check out array script:
#!/bin/bash
#SBATCH -p fast # partition (queue)
#SBATCH -N 1 # number of nodes
#SBATCH --mem 1G # memory pool for all cores
#SBATCH -n 1 # number of cores
#SBATCH -t 0-0:1 # time (D-HH:MM)
#SBATCH -o slurm_array_%A-%a.out
#SBATCH -e slurm_array_%A-%a.err
#SBATCH --array=0-9%4
# Array job runs 10 separate jobs, but not more than four at a time.
# This is flexible and the array ID ($SLURM_ARRAY_TASK_ID) can be used in any way.
echo "Multiplying $SLURM_ARRAY_TASK_ID by 10"
./multiply.sh $SLURM_ARRAY_TASK_ID 10
Run array job:
Start an interactive job with one GPU:
Cancel a job
DeepLabCut: transfer learning
SLEAP:smaller networks
/ceph/scratch/neuroinformatics-dropoff/SLEAP_HPC_test_data/course-hpc-2023
Copy the unzipped training package to your scratch space and inspect its contents:
cp -r /ceph/scratch/neuroinformatics-dropoff/SLEAP_HPC_test_data/course-hpc-2023/labels.v001.slp.training_job /ceph/scratch/$USER/
cd /ceph/scratch/$USER/labels.v001.slp.training_job
ls -1
Training
Suitable for debugging (immediate feedback)
Start an interactive job with one GPU
Execute commands one-by-one, e.g.:
sleap_train_slurm.sh
#!/bin/bash
#SBATCH -J slp_train # job name
#SBATCH -p gpu # partition (queue)
#SBATCH -N 1 # number of nodes
#SBATCH --mem 16G # memory pool for all cores
#SBATCH -n 4 # number of cores
#SBATCH -t 0-06:00 # time (D-HH:MM)
#SBATCH --gres gpu:1 # request 1 GPU (of any kind)
#SBATCH -o slurm.%x.%N.%j.out # STDOUT
#SBATCH -e slurm.%x.%N.%j.err # STDERR
#SBATCH --mail-type=ALL
#SBATCH --mail-user=user@domain.com
# Load the SLEAP module
module load SLEAP
# Define the directory of the exported training job package
SLP_JOB_NAME=labels.v001.slp.training_job
SLP_JOB_DIR=/ceph/scratch/$USER/$SLP_JOB_NAME
# Go to the job directory
cd $SLP_JOB_DIR
# Run the training script generated by SLEAP
./train-script.sh
View the status of your queued/running jobs with squeue --me
View status of running/completed jobs with sacct
:
sacct
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
4232289 slp_train gpu swc-ac 4 RUNNING 0:0
4232289.bat+ batch swc-ac 4 RUNNING 0:0
Run sacct
with some more helpful arguments, e.g. view jobs from the last 24 hours, incl. time elapsed and peak memory usage in KB (MaxRSS):
While you wait for the training job to finish, you can copy and inspect the trained models from a previous run:
sleap_infer_slurm.sh
#!/bin/bash
#SBATCH -J slp_infer # job name
#SBATCH -p gpu # partition
#SBATCH -N 1 # number of nodes
#SBATCH --mem 32G # memory pool for all cores
#SBATCH -n 8 # number of cores
#SBATCH -t 0-01:00 # time (D-HH:MM)
#SBATCH --gres gpu:1 # request 1 GPU
#SBATCH -o slurm.%x.%N.%j.out # write STDOUT
#SBATCH -e slurm.%x.%N.%j.err # write STDERR
#SBATCH --mail-type=ALL
#SBATCH --mail-user=user@domain.com
# Load the SLEAP module
module load SLEAP
# Define directories for exported SLEAP job package and videos
SLP_JOB_NAME=labels.v001.slp.training_job
SLP_JOB_DIR=/ceph/scratch/$USER/$SLP_JOB_NAME
VIDEO_DIR=/ceph/scratch/neuroinformatics-dropoff/SLEAP_HPC_test_data/course-hpc-2023/videos
VIDEO1_PREFIX=sub-01_ses-01_task-EPM_time-165049
# Go to the job directory
cd $SLP_JOB_DIR
# Make a directory to store the predictions
mkdir -p predictions
# Run the inference command
sleap-track $VIDEO_DIR/${VIDEO1_PREFIX}_video.mp4 \
-m $SLP_JOB_DIR/models/231130_160757.centroid/training_config.json \
-m $SLP_JOB_DIR/models/231130_160757.centered_instance/training_config.json \
-o $SLP_JOB_DIR/predictions/${VIDEO1_PREFIX}_predictions.slp \
--gpu auto \
--no-empty-frames
sleap_infer_array_slurm.sh
#!/bin/bash
#SBATCH -J slp_infer # job name
#SBATCH -p gpu # partition
#SBATCH -N 1 # number of nodes
#SBATCH --mem 32G # memory pool for all cores
#SBATCH -n 8 # number of cores
#SBATCH -t 0-01:00 # time (D-HH:MM)
#SBATCH --gres gpu:1 # request 1 GPU
#SBATCH -o slurm.%x.%N.%j.out # write STDOUT
#SBATCH -e slurm.%x.%N.%j.err # write STDERR
#SBATCH --mail-type=ALL
#SBATCH --mail-user=user@domain.com
#SBATCH --array=0-1
# Load the SLEAP module
module load SLEAP
# Define directories for exported SLEAP job package and videos
SLP_JOB_NAME=labels.v001.slp.training_job
SLP_JOB_DIR=/ceph/scratch/$USER/$SLP_JOB_NAME
VIDEO_DIR=/ceph/scratch/neuroinformatics-dropoff/SLEAP_HPC_test_data/course-hpc-2023/videos
VIDEO1_PREFIX=sub-01_ses-01_task-EPM_time-165049
VIDEO2_PREFIX=sub-02_ses-01_task-EPM_time-185651
VIDEOS_PREFIXES=($VIDEO1_PREFIX $VIDEO2_PREFIX)
CURRENT_VIDEO_PREFIX=${VIDEOS_PREFIXES[$SLURM_ARRAY_TASK_ID]}
echo "Current video prefix: $CURRENT_VIDEO_PREFIX"
# Go to the job directory
cd $SLP_JOB_DIR
# Make a directory to store the predictions
mkdir -p predictions
# Run the inference command
sleap-track $VIDEO_DIR/${CURRENT_VIDEO_PREFIX}_video.mp4 \
-m $SLP_JOB_DIR/models/231130_160757.centroid/training_config.json \
-m $SLP_JOB_DIR/models/231130_160757.centered_instance/training_config.json \
-o $SLP_JOB_DIR/predictions/${CURRENT_VIDEO_PREFIX}_array_predictions.slp \
--gpu auto \
--no-empty-frames
SWC | 2024-10-04