HPC Usage Policy (under revision)

Tags HPC

Introduction

This Usage policy applies to the AUB/IT hosted HPC and is meant to ensure that the HPC’s resources are allocated efficiently, fairly and transparently amongst the HPC users. 

This policy was originally established to govern the usage of cluster Arza, and is now pending revision to reflect to recent upgrades and migration to cluster Octopus.

The policy is based on core hours allocation per user/group/project as well as a set of defined queues to manage and prioritize the users’ job submissions. 

This cluster is comprised of components that are funded by IT (Generic Cluster) which are available to the AUB community at large. Some AUB entities may require dedicated resources for a particular group or project; in which case they would fund the addition of the required components to the cluster (Dedicated Resources). 

The following guidelines apply to the operation of the Generic cluster. 

Other policies/queues will need to be developed for the dedicated resources in coordination with those resources’ funder/sponsor. 

General Guidelines

  1. By logging into the HPC system all users, indicate awareness and acceptance of this HPC policy. 

  1. HPC is available to Faculty, students or any external researcher sponsored by a faculty member. 

  1. FCFS (First Come First Server) in the same queue. 

  1. Jobs in higher priority queues tend to be scheduled first 

  1. Submitting Jobs Outside the LSF queues will be killed. 

  1. Default queue is Q2 (24-hours) in case you did not specify a queue name during the submission. 

  1. Once a job is in a queue, it cannot be moved to a different queue. 

  1. Runtime limit: if the job exceeds the runtime limit related to each queue then it will be killed automatically by lsf. 

  1. There is no limit on the number of job submitted by a user 

  1. Maximum number of cores that a user can use at the same time is 128 cores; after a user reaches the 128 cores, his new jobs will be pending until the previous ones are completed. 

  1. Maximum number of core hours that a user/group/project can use each month is 50,000 Core Hours; when reaching this limit, jobs will be killed manually by the hpc admin or the scheduler. This will include the suspended, exited and completed jobs. 

  1. There are 6 different Queues 

  1.  Disk Quota for Each user is 200 GB. 

  1.  The contents of user’s directories are not backed up, meaning that damaged or accidentally deleted files cannot be restored. 

  1. Adding new queues or modifying old queues is one of the basic ways we have to adjust LSF to suit changed conditions or usage patterns 

  1. Accounts of user that have not submitted a job for a year will be terminated and the data will be archived for a year beyond the termination. A new request must be filled to re-activate the account. 

 

Type of Queues                                                                                                

 

Q0 

Q1 

Q2 

Q3 

Q4 

Q5 

Q6 

Q7 

Queue Name 

6-hours 

12-hours 

24-hours 

48-hours 

7-days 

Serial   

Exception 

Scratch 

Description 

Public 

Queue 

Public Queue 

Public Queue 

Public Queue 

Public 

Queue 

Public 

Queue 

Not a Public Q requiring steering committee approval, Users can use all the resources. 

Public Queue that access extra capacity on dedicated cluster. 

Comments 

 

 

 

 

 

 

Other queues will be paused to allow this job to run, Max wait time is 32h, time needed for all running jobs to complete 

Preempted by Other Queues, this Q will be used without Guarantee on a specific Nodes 

Allowed Users 

HPC Users with available core hours. 

 

HPC Users with available core hours. 

HPC Users with available core hours. 

HPC Users with available core hours. 

HPC Users with available core hours. 

 

HPC Users with available core hours. 

Specific Users after they get the committee approval 

All HPC Users 

Max # of Cores 

128 

64 

32 

128 

128 

64 

256 

32 

Max #of slots/U 

 

 

 

64 

32 

16 

 

 

Max # of hours a Single Job Can Run 

12 

24 

48 

168 

72 

72 

24 

Priority 

10 

20 

30 

40 

20 

30 

50 

10 

 

Details

Article ID: 66389
Created
Fri 2/22/19 12:14 PM
Modified
Fri 7/3/20 11:09 AM