Introduction
This Usage policy applies to the AUB/IT hosted HPC and is meant to ensure that the HPC’s resources are allocated efficiently, fairly and transparently amongst the HPC users.
This policy was originally established to govern the usage of cluster Arza, and is now pending revision to reflect to recent upgrades and migration to cluster Octopus.
The policy is based on core hours allocation per user/group/project as well as a set of defined queues to manage and prioritize the users’ job submissions.
This cluster is comprised of components that are funded by IT (Generic Cluster) which are available to the AUB community at large. Some AUB entities may require dedicated resources for a particular group or project; in which case they would fund the addition of the required components to the cluster (Dedicated Resources).
The following guidelines apply to the operation of the Generic cluster.
Other policies/queues will need to be developed for the dedicated resources in coordination with those resources’ funder/sponsor.
General Guidelines
-
By logging into the HPC system all users, indicate awareness and acceptance of this HPC policy.
-
HPC is available to Faculty, students or any external researcher sponsored by a faculty member.
-
FCFS (First Come First Server) in the same queue.
-
Jobs in higher priority queues tend to be scheduled first
-
Submitting Jobs Outside the LSF queues will be killed.
-
Default queue is Q2 (24-hours) in case you did not specify a queue name during the submission.
-
Once a job is in a queue, it cannot be moved to a different queue.
-
Runtime limit: if the job exceeds the runtime limit related to each queue then it will be killed automatically by lsf.
-
There is no limit on the number of job submitted by a user
-
Maximum number of cores that a user can use at the same time is 128 cores; after a user reaches the 128 cores, his new jobs will be pending until the previous ones are completed.
-
Maximum number of core hours that a user/group/project can use each month is 50,000 Core Hours; when reaching this limit, jobs will be killed manually by the hpc admin or the scheduler. This will include the suspended, exited and completed jobs.
-
There are 6 different Queues
-
Disk Quota for Each user is 200 GB.
-
The contents of user’s directories are not backed up, meaning that damaged or accidentally deleted files cannot be restored.
-
Adding new queues or modifying old queues is one of the basic ways we have to adjust LSF to suit changed conditions or usage patterns
-
Accounts of user that have not submitted a job for a year will be terminated and the data will be archived for a year beyond the termination. A new request must be filled to re-activate the account.
Type of Queues
|
Q0
|
Q1
|
Q2
|
Q3
|
Q4
|
Q5
|
Q6
|
Q7
|
Queue Name
|
6-hours
|
12-hours
|
24-hours
|
48-hours
|
7-days
|
Serial
|
Exception
|
Scratch
|
Description
|
Public
Queue
|
Public Queue
|
Public Queue
|
Public Queue
|
Public
Queue
|
Public
Queue
|
Not a Public Q requiring steering committee approval, Users can use all the resources.
|
Public Queue that access extra capacity on dedicated cluster.
|
Comments
|
|
|
|
|
|
|
Other queues will be paused to allow this job to run, Max wait time is 32h, time needed for all running jobs to complete
|
Preempted by Other Queues, this Q will be used without Guarantee on a specific Nodes
|
Allowed Users
|
HPC Users with available core hours.
|
HPC Users with available core hours.
|
HPC Users with available core hours.
|
HPC Users with available core hours.
|
HPC Users with available core hours.
|
HPC Users with available core hours.
|
Specific Users after they get the committee approval
|
All HPC Users
|
Max # of Cores
|
128
|
64
|
32
|
128
|
128
|
64
|
256
|
32
|
Max #of slots/U
|
|
|
|
64
|
32
|
16
|
|
|
Max # of hours a Single Job Can Run
|
6
|
12
|
24
|
48
|
168
|
72
|
72
|
24
|
Priority
|
10
|
20
|
30
|
40
|
20
|
30
|
50
|
10
|