Disk storage

The proper teaming of computing infrastructure with the right selection of storage solutions can assure the best quality of services provided to scientific users. The scale of problems in this area grows with the complexity and efficiency of the supercomputers used. Currently, data storage systems attached to Cyfronet supercomputers store billions of files up to terabytes. The broad thematic scope of research on the resources provided by Cyfronet is reflected in the variety of configurations of the Centre's key supercomputers and thus also in the structure of dedicated storage resources.

Supercomputers use, among others, efficient temporary space, the so-called scratch. The critical element here is the speed of operation, which is why it is based on a high-speed distributed file system architecture – Lustre. The advantage of Lustre is the ability to scale the capacity and efficiency of the disk space. By combining the capacity of multiple servers, I/O bandwidth is aggregated and scales with additional servers. Moreover, bandwidth and/or capacity can be easily increased by dynamically adding more servers without interrupting users' computations. Currently, all supercomputers in Cyfronet use the scratch space implemented by Lustre. In the case of Prometheus, this space has the capacity of 5 PB and the speed of 120 GB/s. Ares has the space with the total of 4 PB and the speed of 80 GB/s. In both of these computers, the scratch space is realized with the help of mechanical disks. In the case of Athena and Helios, user data is stored on solid-state drives. The use of this type of solution significantly increases the efficiency of the system. The capacity of this type of space for Athena is 1.5 PB and achieves the bandwidth of 400 GB/s. In the case of Helios, the scratch space has the capacity of 1.5 PB and the speed of 1.8 TB/s.

Most of Cyfronet's disk memory resources are dedicated to the needs of users of domain services developed in the PLGrid program. The PLGrid infrastructure offers a dedicated workspace for groups using domain services – the functionality necessary to enable collaboration between scientists working in geographically dispersed locations. This functionality is implemented using the Lustre file system. The maximum capacity of the /pr1 resource in the Prometheus supercomputer is 5 PB, and the total speed of reading and writing operations reaches 30 GB/s. In the case of Ares and Athena, the /pr2 resource has the capacity of 5 PB and the speed of 30 GB/s. In the case of Helios, the /pr3 and /pr4 resources have the capacity of 16 PB and the speed of 200 GB/s.
The object-oriented data storage system is an additional resource for storing users and projects' data resources in Cyfronet. It is based on the CEPH software. The data in this system is available through the S3 protocol based on the REST API and is stored in globally unique containers (buckets) in which users store their data in the form of objects.

A particular case of mass storage are resources for large projects and international collaborations in which Cyfronet participates, such as WLCG (Worldwide LHC Computing Grid), analyzing data from the LHC detector at CERN, or CTA (Cherenkov Telescope Array), studying gamma radiation using a network of radio telescopes. These projects require substantial disk resources, often available using unusual protocols such as SRM, xroot, or GridFTP. Cyfronet provides this type of disk space using several instances of dedicated DPM software (Disk Pool Manager) and using dedicated networks such as LHCone. The total capacity of DPM systems in Cyfronet exceeds 2 PB.

Currently, the total available disk capacity used by ACC Cyfronet AGH is approximately 150 PB.