Cluster Storage

Since many users have accounts on a cluster, and some computations use very large data sets, much more space is needed to store all user data compared to a personal computer. Usually on a cluster there are several types of storage.

Like a desktop or laptop computer, individual cluster nodes often have local disk drives. Since this storage is local to a node, it is usually faster to access by the processes running on the node. On our HPC clusters these local drives are used for system software and for scratch space that is available to user's processes. As the name suggests, files in the scratch space are meant to be deleted regularly. The local drives on compute nodes are usually not made accessible to processes running on other nodes.

Most of the data on a cluster is kept in separate storage units that have multiple hard drives. These units are called file servers. A file server is a computer with the primary purpose of providing a location to store data. Regular users do not login to file servers. On HPC clusters these file servers are connected to the same Infiniband switch that connects all nodes, providing relatively fast access to data from all cluster nodes.

Every user on a cluster has a home directory. If you type "pwd" right after ssh-ing to a cluster, you should see /home/<username>, where <username> is your ISU NetID. Physically, home directories can be located on the drives on the login node, on a service node or on a separate file server. In any case home directories are mounted and thus can be accessed on any cluster node.

Since there are multiple users on a cluster, to minimize one user's actions affecting other users, home directories have quotas. On HPC clusters listed on this site, one can not keep more than 5GB of data in their home directory. The home directory should be mainly used for configuration and login files.

On Research HPC clusters, users keep important data in the group working directories that are regularly backed up. These directories are located on file servers and have quotas which are based on how much storage was purchased by the group. The quotas and usage are reported at login, and can also be found in the file /work/<your_group_working_directory>/group_storage_usage which is updated once every hour. For optimal performance we recommend to keep file system less than 70% full. As home directories, group working directories are also accessible on all cluster nodes.

And finally some clusters have fast temporary space named /ptmp that is mounted on all nodes. This storage is not backed up and is purged regularly.

Details about specific clusters can be found in the appropriate cluster User Guide, see menu on the left.

 

Next: Getting data to and from a cluster