Easley Cluster Training: Fundamentals¶
Prerequisites
Valid Easley HPC account
Laptop connected to the AU wireless network
Secure shell client, with connection settings for Easley:
Linux and Mac OS users use your favorite terminal app
PuTTY (putty.org) is recommended for Window users
References
Lab Overview
Getting Started & Basic Linux Commands
Research Software
Job Submission
Job Scheduling
Cluster Best Practices Tips & Tricks
Slides
Getting Started & Basic Linux Commands¶
Reminder: command syntax is:
<command> <required> [optional]
The dollar sign ($) is a symbol for the command prompt, which is a way for Linux to tell you it is ready to accept a command. You do not need to type the first $ you see for each command. Wherever you see $, you should enter the command that immediately follows it. Windows users can use a terminal emulation program like SecureCRT or PuTTY. To connect,you will create a new server profile for easley.auburn.edu, using your normal AU username and password for the login credentials.
If you are using a terminal program, you can simply type:
ssh <auburn userid>@easley.auburn.edu
Issue the following commands and take note of what you see on the screen, the output…
pwd
To find out more about any command, including the one that you just ran (pwd) you can refer to the Linux manual pages using the man command.
man pwd
As seen there, pwd stands for “present working directory.” When you run it, you should see something like /home/username on the screen, because that is the folderdirectory in which you are currently working.
There are some tricks to navigating the man pages…
man ls
Notice there is more information than will fit on one screen. Use the arrow keys and shortcuts to move around in the manual page. Now let’s move around in the Linux file system…
cd
pwd
cd /tmp
pwd
cd ~
pwd
cd stands for change directory (folder). If you issue cd without anything else, it does nothing. If you give it a path (like /tmp), it will take you to a different directory in the file system. The last command uses the special character “~” (tilde) which Linux translates to your home directory. You should see the path to your home directory on the screen after issuing the last command in this section. So, now you have practiced moving around the file system and finding where you are.
Enter the following commands..
ls
mkdir training
ls -al
cd training
pwd
echo "hello world" > hello.txt
ls -al
cat hello.txt
In this section, you created and accessed a new directory, created a file with redirection (>) and viewed its contents.
echo "hello again" > hello2.txt
ls -al
cat hello2.txt
echo "hello world" >> hello.txt
cat hello.txt
echo "hello world" > hello.txt
cat hello.txt
ls -al > files.txt
cat files.txt
Here, we created yet another file with different content using output redirection (>). Then, we appended to the file using the append operation (>>). Next, we overwrote our changes by going back to the single > operator which demonstrates that output redirection can be destructive! Finally, we redirected the output of our “ls -al” command to a file. This demonstrates that we can create files from programs and commands we run. Let’s take a look at who we are and what groups we are members of on the system.
id
Here you can see your username and groups along with the numeric IDs that Linux assigns to each.
Now let’s look at our files…
ls -al
rm hello 2.txt
ls -al
groups
chgrp research hello.txt
ls -al
chmod 750 hello.txt
ls -al
ls -al hello.txt
This group of commands used the rm command to delete a file. Then we used chmod and chgrp to change the file ownerships and permissions. Look back at the output of these commands and pay close attention to the changes when “ls -al” is run. What changes do you see? What do they mean?
cd ..
pwd
ls -al
cd training
pwd
ls -al .
ls -al ..
cd .
pwd
Here we perform a quick demonstration of the “.” and “..” special operators.
Special File and Directory Characters Double dots (..) tells Linux to look back into the directory “above” our current location. A single dot (.) tells Linux we are talking about the current directory. Files that begin with “.” are hidden files in Linux
As you can see, “cd ..” takes us back up to our home directory, while cd “.” doesn’t take us anywhere. You can also see . and .. in your directory listing when you run “ls -al” Now you are ready to do some shell scripting…
nano myscript.sh
You should see a drastic change to your screen! This command has launched the “nano” file editor. Here we can enter the contents of a file, much like a word processor or notepad application in a desktop environment like Windows. Let’s create a small program that we can run using nano. Enter the following lines in your nano window
GNU Nano 2.0.6
^G Get Help |
^O WriteOut |
^R Read File |
^Y Prev Page |
^K CutText |
^C Cur Pos |
^X exit |
^J Justify |
^W Where Is |
^V Next Page |
^U UnCutTx |
^T ToSpell |
Then enter CTRL-X. At the bottom of the screen nano will ask you if you want to save the file. Type Y. Then, nano will ask you what you want to name the file. Nano will suggest the file name we provided when we ran the command. Just hit enter since we already told nano what we wanted to call the file when we entered the nano command above.
ls -al
chmod 750 myscript.sh
ls -al
./myscript.sh
Now we have created a bash script! But we can’t actually run the script until we grant file “execute” privileges.
chmod 750 myscript.sh
Then, we run the script with “./myscript.sh”. The “./” tells Linux that the file we want to run is in the current directory. Does your script work? Is the output what you expected?
Extra Practice¶
Take a look at the following shell script. What do you think this script will do? Try creating the shell script using nano (be sure use a different file name this time) and see if you get the expected result.
GNU Nano 2.0.6
^G Get Help |
^O WriteOut |
^R Read File |
^Y Prev Page |
^K CutText |
^C Cur Pos |
^X exit |
^J Justify |
^W Where Is |
^V Next Page |
^U UnCutTx |
^T ToSpell |
Research Software¶
Create a prime directory within your home directory and copy the prime files there:
pwd
mkdir prime
cp /tools/docs/tutorials/mpi/newprime/* ~/prime/
cd ~/prime
ls -l
chown <username>:<username> *
Take a look at the files you have just copied. For this lab, we are most interested in prime.c and prime.sh. You can see inside those files using more or nano. The prime.c file is program source code written in the C language. In order to turn source code into a set of instructions that the computer understands, it must me compiled. There are many different types of languages and compilers. In this example we will be using the OpenMPI C compiler to build our source code. 1
Setting the Environment/Compiling Code To compile a MPI program so that it will run in parallel, you must use an MPI-enabled compiler.
mpicc prime.c -o prime
To make sure you have Open MPI available to use:
module list
module avail openmpi
module load openmpi/gcc
module list
Now try to compile the source code again:
mpicc prime.c -o prime
If you don’t see any errors or warnings, you have just compiled a parallelized program to calculate prime numbers. The next step is running the prime program.
Note
When testing Code remember, it’s not a good practice to run any code extensively on the login node. However it is certainly acceptable to quickly test an executable just to see if it will run at all. Once an application is running in the foreground, you can enter [CTRL-C] to stop it. If you are unsure about what you might be running, you can use the “ps” or “top” commands to see which processes are associated with your account. These commands will return Process IDs (or PIDs). If necessary, you can use the “kill” command along with the PID to stop a process. See “man kill” “man ps”
Now briefly run the executable that you just created by running it directly on the login node:
mpirun prime
You should see a three-column list of numbers. (If you see an error message, something has gone wrong, so let an instructor know.) Use CTRL-C to stop the program. Now that we know our compiled program works as expected, we need to confirm that it will also run through the scheduler. The scheduler makes sure that resources in a cluster are shared efficiently.
Job Scheduling¶
On the Easley cluster, Slurm serves as the scheduler and workload manager. The scheduler takes user requests in the form of jobs and allocates resources to these jobs based on availability and cluster policy.
Slurm groups nodes into partitions based on type and provides them with special functionality. Users specify a partition in their job submission to access different types of capacity.
The general partition is available to all Easley users. Partitions available to all users are community partitions. Nodes in the general partition consist of standard nodes as defined in the Easley User Guide. This is by far the largest partition. If no partition is specified in the job sub, the general partition will be used. Jobs submitted to the general partition may be preempted.
There is also at least one partition for each PI lab group. These partitions are dedicated partitions. Every user in the Easley cluster is sponsored by a PI that purchased capacity in the cluster. When the PI approves a user’s account request, that user is automatically put in that PI’s lab group. Membership in a lab group allows access to the group’s corresponding partition and gives them priority access. Use the ‘sinfo’ command to view available partitions.
sinfo -s
Specifying a dedicated partition in job submission has advantages and disadvantages. The disadvantage is that it restricts the job to only that capacity purchased by the PI. This is a small subset of the capacity in the general partition. As a result, your job may take longer to start. The advantage is that it gives priority access. Your job can preempt other jobs running on that capacity from outside your lab group and, once it does start, your job will not be preempted.
Job Submission¶
Let’s use srun to run your program in interactive mode. This will run your program as a job on one of the compute nodes, but will allow us to issue commands to the node interactively.
srun -p general -N1 -n2 --pty /bin/bash
module load openmpi/gcc
mpirun ~/prime/prime
exit
Finally, submit your job as a batch job to the cluster requesting two processors ( -n2 ) from a single node ( -N1 ) in the hpcadmin_std partition.
sbatch -p general -N1 -n2 ~/prime/prime.sh
What happened? Did the job run successfully? Here’s the prime.sh script:
mpirun prime > Prime.out
Anything missing?
Here are some other important sub parameters by example (man sbatch for more details):
sbatch -p general -N1 -t4:00:00 --mail-type=begin,end,fail --mail-user=nosuer@auburn.edu prime.sh
The ‘-t’ parameter is the walltime. As you run your program, you should get a good idea how long it will run. This is the walltime. Providing this in the job sub allows the scheduling to make better decisions and run more efficiently. The default walltime if none is specified is two days.
How to monitor your job:
squeue -u <userid>
scontrol show job <jobid>
How to cancel a job:
scancel job <jobid> # cancel job by jobid
scancel job -u username # cancel all jobs by username
Next Steps¶
You can add sbatch options to your shell script. This saves time and lots of typing when you have to specify lots of options For example, you can run prime using a Slurm script: prime.sh…
#!/bin/bash
email=`whoami`@auburn.edu
workdir=/home/`whoami`/prime
cd $workdir
#SBATCH -J "Prime"
#SBATCH --mail-type=ALL
#SBATCH --mail-user=$email
#SBATCH --nodes=1
#SBATCH --ntasks=48
#SBATCH --partition=general
#SBATCH -d $workdir
module load openmpi/gcc
mpirun prime
Notice the #SBATCH directives in this script are equivalent to some of the sbatch options that you have specified on the command line. Find software for your particular research domain and begin experimenting. We have many popular software packages available for use. You can see them with …
module avail
If a software package that you would like to use is not already available, visit https://aub.ie/hpcsw to request a new package. The best way to learn your way around is with hands-on experience. If you run into any problems or have questions, email hpcadmin@auburn.edu.
Best Practices Summary¶
Running a new program for the first time:
First, run briefly on login node just to make sure that your code will run. If the program runs without an immediate indication or an error or problem, use CTRL-C to exit.
Then, run using srun in interactive mode to make sure that it will run on a compute node.
Finally, run in batch mode using sbatch.
Caution
Do not run jobs on the login node except as a test. This means short jobs using small amounts of memory to ensure that your code will run. Processes that violate this will be killed.
1. Don’t submit a job, assume it’s running correctly and walk away or leave for weekend. Make sure the job is running and, if not, understand why not.
Specify walltimes in your job submission.
Allows Scheduler to maximize utilization which means your jobs run sooner.
Users should receive an email after a job completes that contains the actual walltime.
Submit short-running jobs with fewer resources in order to reduce likelihood of preemption when not using your group’s partition.
Clean up when your jobs are finished.
Easley does not provide archival or long-term storage.
If files no longer need to be available for work on the system, copy them off and delete them so that the space can be used for active projects.
Pay attention to your disk usage. Once the hard limit is reached in disk space or # of files, your program will stop executing.
Do not share passwords or accounts. If you want others to access your files, then set them to read only.
How to Get Help¶
Because Easley is regarded as a research (rather than production) system, HPC support is normally available only during regular business hours. When reporting problems, please provide as much relevant information as possible. This should include the following, as appropriate:
date and time when the problem occurred
job number
text of the command(s) which you issued
exact and complete text of any error messages
any other information helpful in identifying or resolving the problem
If further assistance is needed, please email to schedule an appointment with one of the HPC admins at hpcadmin@auburn.edu.