Computing infrastructure and policies¶
Storage¶
Path |
Performance |
Usage |
Quota (Space/Files) |
Backup |
Auto-cleanup |
|---|---|---|---|---|---|
|
High |
|
|||
|
Low |
|
100GB/1000K |
Daily |
no |
|
High |
|
no |
no |
90 days |
|
Highest |
|
4TB/- |
no |
at job end |
|
Fair |
|
200GB/1000K |
Daily |
no |
|
Low |
|
500GB |
no |
no |
Nota
The $HOME file system is backed up once a day. For any file
restoration request, file a request to Apuana’s IT support with the path to the file or directory to
restore, with the required date.
Aviso
Currently there is no backup system for any other file systems of the Mila cluster. Storage local to personal computers, Google Drive and other related solutions should be used to backup important data
$HOME¶
$HOME is appropriate for codes and libraries which are small and read once,
as well as the experimental results that would be needed at a later time (e.g.
the weights of a network referenced in a paper).
Quotas are enabled on $HOME for both disk capacity (blocks) and number of
files (inodes). The limits for blocks and inodes are respectively 100GiB and 1
million per user. The command to check the quota usage from a login node is:
beegfs-ctl --cfgFile=/etc/beegfs/home.d/beegfs-client.conf --getquota --uid $USER
$SCRATCH¶
$SCRATCH can be used to store processed datasets, work in progress datasets
or temporary job results. Its block size is optimized for small files which
minimizes the performance hit of working on extracted datasets.
Nota
Auto-cleanup: this file system is cleared on a weekly basis, files not used for more than 90 days will be deleted.
$SLURM_TMPDIR¶
$SLURM_TMPDIR points to the local disk of the node on which a job is
running. It should be used to copy the data on the node at the beginning of the
job and write intermediate checkpoints. This folder is cleared after each job.
projects¶
projects can be used for collaborative projects. It aims to ease the
sharing of data between users working on a long-term project.
Quotas are enabled on projects for both disk capacity (blocks) and number
of files (inodes). The limits for blocks and inodes are respectively 200GiB and
1 million per user and per group.
$ARCHIVE¶
$ARCHIVE purpose is to store data other than datasets that has to be kept
long-term (e.g. generated samples, logs, data relevant for paper submission).
$ARCHIVE is only available on the login nodes. Because this file system
is tuned for large files, it is recommended to archive your directories. For
example, to archive the results of an experiment in
$SCRATCH/my_experiment_results/, run the commands below from a login node:
cd $SCRATCH
tar cJf $ARCHIVE/my_experiment_results.tar.xz --xattrs my_experiment_results
Disk capacity quotas are enabled on $ARCHIVE. The soft limit per user is
500GB, the hard limit is 550GB. The grace time is 7 days. This means that one
can use more than 500GB for 7 days before the file system enforces quota.
However, it is not possible to use more than 550GB.
The command to check the quota usage from a login node is df:
df -h $ARCHIVE
Nota
There is NO backup of this file system.