GPU Workloads

Presentation Slides

Introduction

Easley GPU nodes are equipped with NVIDA Tesla T4 devices. Two generally accessible GPU partitions exist, based on the number of GPU devices that are present per node.

1sinfo -p gpu2,gpu4 -O gres,partitionname,nodes

GPU Modules & Locations

CUDA Toolkit and a related programming framework module form the base for your GPU workload environment.

A number of code examples and scripts are also available in the /tools/gpu subdirectory …

module show cuda11.0/toolkit
ls /tools/gpu/cuda_simple

Basic CUDA C++ Example

Copy the sample source files to a location in your home directory…

cd ~
mkdir hpc_gpu
cd hpc_gpu
cp -R /tools/gpu/tutorials/* .
ls
cd hello

Now let’s take a quick look at the source code, compile, and run it…

module load cuda11.0/toolkit
cat hello.cu
nvcc -o hello hello.cu
srun -N1 -n1 --partition=gpu4 --gres=gpu:tesla:1 ./hello
cat hello_threads.cu
nvcc -o threads hello_threads.cu
srun -N1 -n1 --partition=gpu4 --gres=gpu:tesla:1 ./threads

And to be sure, let’s use the CUDA profiler to see exactly how the programs are using the GPU …

srun -N1 -n1 -p gpu4 --gres=gpu:tesla:1 nvprof $HOME/hpc_gpu/hello/hello
srun -N1 -n1 -p gpu4 --gres=gpu:tesla:1 nvprof $HOME/hpc_gpu/hello/threads

Using Cuda with Pytorch

Pytorch is one of the many popular deep learning frameworks used among data scientist. In order to set up and run CUDA operations, Pytorch provides the torch.cuda package. This package adds support for CUDA tensor types, that impliment the same function as CPU tensors but utilizes GPUS for computation.

Installing Pytorch

A virtual environment will need to be created before installing Pytorch. For more information about virtual environments please visit the following:

https://hpc.auburn.edu/hpc/docs/hpcdocs/build/html/easley/python.html#python-virtual-environments

cd ~/ hpc_gpu
module load python
python3 -m virtualenv pytorch_lab
source pytorch_lab/bin/activate

At this point, the virtual environment is created and activated using the source command. Now that the pytorch virtual environment is activated, install pytorch using the following

cd pytorch_lab
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
pip list installed
deactivate

In this first exercise we will use the torch.cuda package to check the availability of the cuda device and to gather information.

torch.cuda

srun -N1 -n1 --partition=gpu2 --gres=gpu:tesla:1 --pty /bin/bash
module load python
module load cuda11.0/toolkit
cd ~/hpc_gpu/pytorch_lab
source bin/activate
python3
import torch

To check if your system supports cuda, use the following command. is_available() will return a bool value either true if your system supports cuda or false.

torch.cuda.is_available()
true

The current_device() command will provide information about the id of the cuda device

torch.cuda.current_device()
0

Taking the following id value provided above, you can also retrieve the name of the device using the following command

torch.cuda.get_device_name(0)
'Tesla T4'

To provide even further information using the id of the cuda device you can do the following

torch.cuda.get_device_properties(0)
_CudaDeviceProperties(name='Tesla T4', major=7, minor=5, total_memory=15109MB, multi_processor_count=40)

Lets finish the interactive job by doing the following. Exit the python program and the interactive job as such

exit()

Copy the following exercise in the virtualenvs/pytorch directory

cp ../pytorch/pytorch.py .
./pytorch.py
exit

Once placed back onto the login node, deactivate the virtual environment

deactivate

Batch Job Submission

Create a bash script named pytorch_lab.sh and place the following

nano pytorch_lab.sh
#!/bin/bash

#SBATCH --partition=gpu4
#SBATCH --time=5:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:tesla:1
#SBATCH --job-name=pytorch_lab

module load python
module load cuda11.0/toolkit/11.0.3

source ~/hpc_gpu/pytorch_lab/bin/activate

python3 pytorch.py  > results.out
sbatch pytorch_lab.sh