The High-Performance Computing System is the fundamental support for the entire EarthLab facility. This dedicated device will have 15 PFLOPS of peak performance and 80 PB of storage capacity, reaching the global advanced level.
The High-Performance Computing System is the fundamental support for the entire EarthLab facility. This dedicated device will have 15 PFLOPS of peak performance and 80 PB of storage capacity, reaching the global advanced level. The system will provide a high-performance, scalable, dedicated hardware platform for numerical simulation of the Earth system and regional high-precision environmental simulation. It will support efficient simulation and data exchange of the various models, as well as remote access services through high-speed networks and necessary hardware support for the Super Simulation Support and Management System, and the Database, Data Assimilation and Visualization System. The system uses general-purpose X86 processors at its core, making it seamlessly compatible with all kinds of Earth models. The use of reconfigurable multi-partition cluster architecture will realize efficient multi-model coupling, and the three-dimensional structured multi-dimensional TORUS architecture of the high-performance network will improve the efficiency of model simulation. The use of liquid cooling technology will enhance the computing power obtained per unit of energy consumption to reduce the system PUE (power usage efficiency).
Model Simulation Subsystem The computational performance of the system is 15 PFLOPS, which is mainly responsible for the computational tasks of the Earth system model. It has abundant computing resources and provides powerful computing strength. The Model Simulation Subsystem has a variety of computing resources, such as general purpose X86 processors and dedicated multi-core accelerators, to provide hardware support for the application model software. The Model Simulation Subsystem includes five components: Earth system simulator; regional high precision simulator; simulation support; data assimilation; and shared service. The Earth system simulator also includes four simulators: atmospheric simulator; ocean simulator; land simulator; and flux coupling simulator.
The Model Simulation Subsystem uses advanced water-cooled cooling design. Through the water-cooled plate and other high-performance heat transfer components, the high heat generated by the processor, accelerator and other components will be removed, greatly improving the cooling efficiency and system reliability.
Network Switching Subsystem The Network Switching Subsystem undertakes the interconnection between the subsystems within the system, realizes the high-speed data communication and the high-speed access to the stored data in the calculation process, provides the channel of the system management, realizes the network security protection system, and achieves high-speed and reliable external network services.
The Network Switching Subsystem uses 6D-TORUS architecture; the maximum communication bandwidth is 100 Gbps; and the minimum communication delay is 1 μs. The whole system computing network is divided into three layers. The first layer is a full switching network, in which the 8 computing nodes will form a super-node; the second layer of 12 super-nodes, connected with a 2 × 2 × 3 three-dimensional cube structure, will form a silicon-node; and the third layer is the silicon-nodes connected in three directions (x, y, z) with a three-dimensional cube structure. Together, they form the entire system; namely, silicon cube (SI-CUBE).
Data Storage Subsystem The Data Storage Subsystem includes 30 PB of online storage and 50 PB of offline storage. It will mainly be responsible for storage of the Earth system model input data, simulation results, observation data, assimilation data, visual data, and other related data. The online storage includes a fast storage pool and an ordinary storage pool, providing a total capacity of 30 PB, aggregate bandwidth of up to 1 TB / s, and space utilization of up to 80%. The fast storage pool uses high-speed SSD hard drives, while the ordinary storage pool uses large-capacity SATA hard drives. The data in the calculation process can be saved to the fast storage pool first, and then migrated to the ordinary storage pool after a period of time. The offline storage capacity is 50 PB, using tape library equipment. At the front end, the intelligent archiving system uses the disk array as a data storage buffer pool; users can migrate their old data to the pool. At the back end, the large-capacity low-cost tape library storage device constitutes a storage pool that can be shared between different types of servers.
Support and Management Subsystem This subsystem mainly includes two parts: hardware and software. The hardware part comprises various service nodes, mainly for the internal and external services, such as: management server, login server, authentication server, compiling server, WEB server, etc. The software part is mainly to provide system-level software and necessary tools for the computer system, including system monitoring and management software, job scheduling software, compiler development tools, a parallel communication environment, efficient mathematics library, high-speed I/O library, database, and other basic software components for the model development and system operation and maintenance management.