HPC Usage Policy (under revision)

HPC

Introduction

This Usage policy applies to the AUB/IT hosted HPC and is meant to ensure that the HPC’s resources are allocated efficiently, fairly and transparently amongst the HPC users.

This policy was originally established to govern the usage of cluster Arza, and is now pending revision to reflect to recent upgrades and migration to cluster Octopus.

The policy is based on core hours allocation per user/group/project as well as a set of defined queues to manage and prioritize the users’ job submissions.

This cluster is comprised of components that are funded by IT (Generic Cluster) which are available to the AUB community at large. Some AUB entities may require dedicated resources for a particular group or project; in which case they would fund the addition of the required components to the cluster (Dedicated Resources).

The following guidelines apply to the operation of the Generic cluster.

Other policies/queues will need to be developed for the dedicated resources in coordination with those resources’ funder/sponsor.

General Guidelines

By logging into the HPC system all users, indicate awareness and acceptance of this HPC policy.

HPC is available to Faculty, students or any external researcher sponsored by a faculty member.

FCFS (First Come First Server) in the same queue.

Jobs in higher priority queues tend to be scheduled first

Submitting Jobs Outside the LSF queues will be killed.

Default queue is Q2 (24-hours) in case you did not specify a queue name during the submission.

Once a job is in a queue, it cannot be moved to a different queue.

Runtime limit: if the job exceeds the runtime limit related to each queue then it will be killed automatically by lsf.

There is no limit on the number of job submitted by a user

Maximum number of cores that a user can use at the same time is 128 cores; after a user reaches the 128 cores, his new jobs will be pending until the previous ones are completed.

Maximum number of core hours that a user/group/project can use each month is 50,000 Core Hours; when reaching this limit, jobs will be killed manually by the hpc admin or the scheduler. This will include the suspended, exited and completed jobs.

There are 6 different Queues

Disk Quota for Each user is 200 GB.

 The contents of user’s directories are not backed up, meaning that damaged or accidentally deleted files cannot be restored.

Adding new queues or modifying old queues is one of the basic ways we have to adjust LSF to suit changed conditions or usage patterns

Accounts of user that have not submitted a job for a year will be terminated and the data will be archived for a year beyond the termination. A new request must be filled to re-activate the account.

Type of Queues

	Q0	Q1	Q2	Q3	Q4	Q5	Q6	Q7
Queue Name	6-hours	12-hours	24-hours	48-hours	7-days	Serial	Exception	Scratch
Description	Public Queue	Public Queue	Public Queue	Public Queue	Public Queue	Public Queue	Not a Public Q requiring steering committee approval, Users can use all the resources.	Public Queue that access extra capacity on dedicated cluster.
Comments							Other queues will be paused to allow this job to run, Max wait time is 32h, time needed for all running jobs to complete	Preempted by Other Queues, this Q will be used without Guarantee on a specific Nodes
Allowed Users	HPC Users with available core hours.	HPC Users with available core hours.	HPC Users with available core hours.	HPC Users with available core hours.	HPC Users with available core hours.	HPC Users with available core hours.	Specific Users after they get the committee approval	All HPC Users
Max # of Cores	128	64	32	128	128	64	256	32
Max #of slots/U				64	32	16
Max # of hours a Single Job Can Run	6	12	24	48	168	72	72	24
Priority	10	20	30	40	20	30	50	10

0 reviews

Print Article

Updating...

HPC Usage Policy (under revision)

Introduction

General Guidelines

Type of Queues

Related Articles (2)

Related Services / Offerings (1)

Deleting...