ZHEJIANG LAB
News  Detail
Breaking Language Barriers, Accelerating Scientific Research: The Debut of ZJ Lab's 021 Science Foundation Model
Date: 2026-01-21

On December 18, 2025, Zhejiang Lab (ZJ Lab) held the 021 Science Foundation Model Innovation Cooperation Conference to showcase the development progress of the 021 Science Foundation Model and its suite of domain-specific scientific models. The event aimed to foster open collaboration and synergistic innovation, and accelerate discoveries and revolutionize research paradigms by uniting partners. It attracted leaders from relevant national ministries and commissions, provincial and municipal departments, as well as representatives from innovation institutions such as national laboratories, state key laboratories, high-level universities, and innovation-based enterprises. Participants engaged in in-depth discussions on topics including "AI +" Science, etc.

Development of Science Foundation Models Constitutes a Strategic Innovation

In 2025, open-source large language models (LLMs), exemplified by DeepSeek and Qwen, achieved performance parity with the world's leading closed-source foundation models. This milestone positioned China and the United States at the forefront of the same technological "ocean". As a new research and development (R&D) institution dedicated to intelligent computing, ZJ Lab has been contemplating: Could LLMs represent the ultimate form of foundation models? How can artificial intelligence (AI) more effectively empower scientific research?

"The dimensions expressed by language are far below those required by science," noted XUE Guirong, the chief engineer of ZJ Lab's comprehensive planning regime for scientific models, in his keynote presentation. He emphasized that scientific data—encompassing multiple dimensions such as time, space, and energy—constitutes a high-dimensional representation of the evolution laws of complex physical systems. For example, over 75% of information in earth science is stored in non-textual data such as sound waves and magnetic fields; astronomy relies on images and spectra to analyze the structure and evolution of the universe; the mysteries of life science are embedded in the human genome that contains 3 billion base pairs of DNA.

Meanwhile, AlphaFold has undergone three architectural iterations, progressively achieving precise prediction of complex biomolecular structures from amino acid sequences. Its prediction efficiency has surpassed the limits of human research capabilities, completing predictions for over 200 million protein structures in mere weeks. However, AlphaFold3's utility for scientists remains confined to the niche domain of biochemical molecules, falling far short of encompassing the broad scope of life science.

To address scientific challenges, we urgently need to break through the boundaries of linguistic space, develop science foundation models, construct a higher-dimensional space integrating science and language, establish deep connections between different subjects, and revolutionize research paradigms. As WANG Jian (Academician of the Chinese Academy of Engineering (CAE) and Director of ZJ Lab) stated, "Foundation models are considered the crown of AI, and science foundation models represent the crown jewels of AI." The development of science foundation models constitutes an innovative endeavor with strategic, fundamental, and pioneering significance, poised to serve as a driving force for scientific and technological (S&T) innovation.

From Zero to One: The Challenging Task

"Foundation models set the upper bound for a model's performance," said XUE Guirong. "Just as a 1-liter bottle cannot hold 3 liters of water, it is difficult to achieve breakthroughs within the framework of others' general-purpose models." Therefore, ZJ Lab opted not to build upon existing general-purpose foundation models, but instead embarked on a journey "from zero to one" to construct its own model.

The R&D team's first challenge was to overcome the "scientific data paradox". It is understood that AlphaFold spent six years exploring methods for the unified tokenization of diverse biochemical molecules. However, training a foundation model with data from multiple disciplines—such as mathematics, physics, chemistry, astronomy, life science, earth science, and materials science—will exponentially increase the workload and complexity.

To build a scientific model is much more challenging than to build an LLM or code model. Developing a science foundation model that integrates various types of data across disciplines represents an unprecedented challenge with no prior experience to draw upon. However, ZJ Lab, leveraging its systematic innovation in computing power, data and models, has gradually clarified its technical roadmap through exploratory practices in scientific model development in fields such as earth science, astronomy, and genetics.

By leveraging the high structural equivalence between Mixture of Experts (MoE) and scientific knowledge, the R&D team has constructed a OneTokenizer (for unified representation of scientific data) + MoE fusion model architecture. This aims to encode both scientific data and textual corpora into a unified high-dimensional space, enabling the model to identify and process scientific data and reason about complex scientific problems.

Ultimately, after nearly 10,000 experiments, the team established a training framework encompassing pre-training, post-training, and reinforcement learning phases. They successfully trained the 021 Science Foundation Model, boasting a scale of 236 billion parameters. This model has formed three foundational pillars: interdisciplinary knowledge, cross-domain reasoning, and cross-lingual understanding (supporting 204 languages). It demonstrates exceptional scientific reasoning capabilities, enabling deep analysis, deduction, and validation of diverse scientific problems.

Open Collaboration: Jointly Ushering in the Era of AI in Science

Currently, the 021 model has become a "research partner" across diverse fields such as earth science, astronomy, life science, materials science, breaking down disciplinary boundaries and sparking innovation.

Since its global release in April 2025, the geoscience model GeoGPT has constantly undergone iterative upgrades. Its advanced version GeoGPT-VL supports four typical tasks: image description and summarization, image information extraction, geospatial reasoning, and geoscientific analysis and reasoning, enabling a leap from image interpretation to expert-level reasoning. OneAstronomy, a foundation model for astronomy, maps diverse data types—spectra, light curves and images—into a unified representation space, facilitating cross-modal fusion and reasoning and reconstructing data processing paradigms. This innovation unlocks autonomous telescope observation, and helps realize the vision of "observation as discovery". Genos, a Human-Centric genomic foundation model, demonstrates powerful interdisciplinary reasoning by predicting symptoms of genetic mutations, identifying novel pathogenic loci, and efficiently assisting in diagnosis and mechanism discovery of genetic diseases. OnePorous, a porous alloy model, enables reverse generation of new porous structures based on performance requirements. Relevant technologies have been applied to the manufacturing (or 3D printing) of satellite bus structures, achieving significant mass reduction and shortened production cycles.

These advances not only stem from technological innovation, but also benefit from mechanism  innovation. By holding "Seed Programs", ZJ Lab and its partners, with the goal of serving AI model training, have cultivated over 900 urgently needed AI professionals via hands-on training and collaborative research, becoming a new fountain of strength for model innovation. For instance, ZJ Lab and BGI-Research jointly trained the 10-billion-parameter Genos model from scratch. Based on the 021 Science Foundation Model, Genos achieved 98.3% accuracy in identifying pathogenic mutations.

Through the "Scientists Studio" initiative, ZJ Lab has fostered deep collaboration with global scientists to advance AI-science integration. By partnering with geoscientists from the US, UK, and beyond, the lab has achieved full-chain innovation—from open access (OA) to data extraction and paleontological classification—based on GeoGPT. This breakthrough enables fossil data to speak for itself, systematically unveiling the laws of paleontological evolution.

As emphasized by TONG Guili, Secretary of the Party Committee of ZJ Lab, "This conference serves not only as a matchmaking event to enhance technical exchanges and deepen innovation cooperation, but also as a platform to build industry consensus and jointly explore development pathways. We look forward to working with all stakeholders to advance collaborative innovation in science foundation models, drive paradigm shifts in scientific research, and achieve deep integration of technological and industrial innovation." Looking ahead, ZJ Lab will further practice open science by creating S&T public goods, and work hand-in-hand with partners to usher in the era of AI in science.