ZHEJIANG LAB
News  Detail
"ZJ Alkaid" Selected for "Computing Service Pilot Program"
Date: 2023-04-07

On March 29, the list of winners of the first "Computing Service Pilot Program" of the China Academy of Information and Communications Technology ("CAICT") was officially released in Beijing, including the ZJ Alkaid Intelligent Computing Operating System ("ZJ Alkaid") independently developed by Zhejiang Lab.

Computational power, as the core productivity in the digital economy era, is becoming as fundamental to all walks of life as water and electricity. The first "Computing Service Pilot Program", jointly initiated by CAICT and the People Data under People's Daily Online, is intended to summarize excellent practices and industrial experience, improve the quality of computing service, and match the scale of computing infrastructure.

"ZJ Alkaid" is the core computational power scheduling system and basic platform of ZJ Lab's Major Scientific Facility for Intelligent Computing Data Reactor, which can aggregate a large number of heterogeneous computational power, support upper-level applications, and realize the efficient utilization of computational power.

"Heterogeneity of computational power and software incompatibility are common in practice. Even if an organization uses the same computational power provider at different stages, there are cases of computational power incompatibility due to software and hardware upgrades. This leads to high barriers to using computing infrastructure, and the ineffectiveness of computational power," YANG Fei said, Senior Research Specialist at ZJ Lab's Research Center for Intelligent Computing Software.

In response to this pain point, we focused on solving the problems of aggregation, management, intelligent scheduling and full-stack and software stack autonomous control of multi-cluster heterogeneous computational power during the construction of ZJ Alkaid, so that ZJ Alkaid is able to provide algorithms, functions or frameworks corresponding to different computing tasks to avoid repeated development and incompatibility. In the meantime, the system's capability in aggregating computational power and scheduling data can efficiently transmit data to relevant computing clusters and nodes.

According to introduction, ZJ Alkaid supports the scheduling of over 10 types of heterogeneous computing resources after aggregation, with over 200P scheduling capacity and over 50PB storage capacity. The system's computing throughput, memory access efficiency and speedup ratio for distributed computing reach more than 90% of the native system. A variety of open-source computing frameworks, commercial computing software, self-developed computing engines, computing tasks and application scenarios are supported.

At present, ZJ Alkaid's computational power service capability has been applied in scientific research fields such as materials, genes, pharmaceuticals, astronomy and breeding, and the corresponding computational power service platforms have been built. The algorithm library integrates more than 60 typical intelligent computing algorithms and more than 30 intelligent computing data sets (databases).

"We completed the development and release of ZJ Alkaid V1.0 in August 2022, and have already realized support for typical applications of the data reactor. Under the current agile iterative development model, the system will become more efficient and intelligent. We're expecting ZJ Alkaid to provide computing services in more and more intelligent data centers," PAN Aimin said, Senior Research Expert and Chief Architect of Intelligent Computing Data Reactor at ZJ Lab.