GPU Workloads¶
Presentation Slides
Introduction¶
Easley GPU nodes are equipped with NVIDA Tesla T4 devices. Two generally accessible GPU partitions exist, based on the number of GPU devices that are present per node.
1sinfo -p gpu2,gpu4 -O gres,partitionname,nodes
GPU Modules & Locations¶
CUDA Toolkit and a related programming framework module form the base for your GPU workload environment.
A number of code examples and scripts are also available in the /tools/gpu subdirectory …
module show cuda11.0/toolkit
ls /tools/gpu/cuda_simple
Basic CUDA C++ Example¶
Copy the sample source files to a location in your home directory…
cd ~
mkdir hpc_gpu
cd hpc_gpu
cp -R /tools/gpu/tutorials/* .
ls
cd hello
Now let’s take a quick look at the source code, compile, and run it…
module load cuda11.0/toolkit
cat hello.cu
nvcc -o hello hello.cu
srun -N1 -n1 --partition=gpu4 --gres=gpu:tesla:1 ./hello
cat hello_threads.cu
nvcc -o threads hello_threads.cu
srun -N1 -n1 --partition=gpu4 --gres=gpu:tesla:1 ./threads
And to be sure, let’s use the CUDA profiler to see exactly how the programs are using the GPU …
srun -N1 -n1 -p gpu4 --gres=gpu:tesla:1 nvprof $HOME/hpc_gpu/hello/hello
srun -N1 -n1 -p gpu4 --gres=gpu:tesla:1 nvprof $HOME/hpc_gpu/hello/threads
Using Cuda with Pytorch¶
Pytorch is one of the many popular deep learning frameworks used among data scientist. In order to set up and run CUDA operations, Pytorch provides the torch.cuda package. This package adds support for CUDA tensor types, that impliment the same function as CPU tensors but utilizes GPUS for computation.
Installing Pytorch¶
A virtual environment will need to be created before installing Pytorch. For more information about virtual environments please visit the following:
https://hpc.auburn.edu/hpc/docs/hpcdocs/build/html/easley/python.html#python-virtual-environments
cd ~/ hpc_gpu
module load python
python3 -m virtualenv pytorch_lab
source pytorch_lab/bin/activate
At this point, the virtual environment is created and activated using the source command. Now that the pytorch virtual environment is activated, install pytorch using the following
cd pytorch_lab
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
pip list installed
deactivate
In this first exercise we will use the torch.cuda package to check the availability of the cuda device and to gather information.
torch.cuda¶
srun -N1 -n1 --partition=gpu2 --gres=gpu:tesla:1 --pty /bin/bash
module load python
module load cuda11.0/toolkit
cd ~/hpc_gpu/pytorch_lab
source bin/activate
python3
import torch
To check if your system supports cuda, use the following command. is_available() will return a bool value either true if your system supports cuda or false.
torch.cuda.is_available()
true
The current_device() command will provide information about the id of the cuda device
torch.cuda.current_device()
0
Taking the following id value provided above, you can also retrieve the name of the device using the following command
torch.cuda.get_device_name(0)
'Tesla T4'
To provide even further information using the id of the cuda device you can do the following
torch.cuda.get_device_properties(0)
_CudaDeviceProperties(name='Tesla T4', major=7, minor=5, total_memory=15109MB, multi_processor_count=40)
Lets finish the interactive job by doing the following. Exit the python program and the interactive job as such
exit()
Copy the following exercise in the virtualenvs/pytorch directory
cp ../pytorch/pytorch.py .
./pytorch.py
exit
Once placed back onto the login node, deactivate the virtual environment
deactivate
Batch Job Submission¶
Create a bash script named pytorch_lab.sh and place the following
nano pytorch_lab.sh
#!/bin/bash
#SBATCH --partition=gpu4
#SBATCH --time=5:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:tesla:1
#SBATCH --job-name=pytorch_lab
module load python
module load cuda11.0/toolkit/11.0.3
source ~/hpc_gpu/pytorch_lab/bin/activate
python3 pytorch.py > results.out
sbatch pytorch_lab.sh