Recently, the joint research on astronomical foundation models initiated by the National Astronomical Observatories of the Chinese Academy of Sciences (NAOC) and Zhejiang Lab (ZJ Lab) has yielded new achievements. The AI model they developed, SpecCLIP, has been published in ACTA ASTROPHYSICA SINICA, i.e. the Astrophysical Journal (click "Read original" at the end to access full article).
Stellar spectra are often described by scientists as the "fingerprints" of the cosmos. They contain unique identity information about stars, including a star's temperature, chemical composition and surface gravity. By analyzing these "chemical signatures", astronomers can trace the Milky Way's evolutionary history from its beginning to the present, much like archaeologists reconstruct the past.
However, a significant challenge persists in practical research: different survey projects, such as China's LAMOST (Guoshoujing Telescope) and Europe's Gaia satellite, acquire spectral data through varying methods, resolutions, and wavelength ranges. These datasets are like stories told in different dialects, making it difficult to combine them directly for large-scale analysis.
SpecCLIP was born to overcome this data barrier. The research team innovatively introduced concepts similar to "large language models" into astronomy, mapping stellar spectral data from different telescopes into a unified "feature space"—much like translating diverse languages into a universal grammar. Through contrastive learning, SpecCLIP learns intrinsic connections automatically between two types of spectra. It enables efficient data alignment and transformation across different instruments and survey projects, paving a new technical avenue for large-scale collaborative research.

Fig. 1: SpecCLIP has been deployed on the ZJ Lab platform (https://digital-galaxy.zero2x.org.cn/) and the National Astronomical Data Center's AI foundation model platform (https://nadc.china-vo.org/ai/), creating a specialized interface for researchers to analyze stellar spectra. Users can upload LAMOST-like spectral data, perform parameter measurements through a ChatGPT-like conversational interface, and leverage AI to retrieve, organize, and assist in analyzing related documents.
More importantly, SpecCLIP is not a specialist AI model designed for a single task, but a framework approaching a foundation model. It can not only predict stellar atmospheric parameters and elemental abundances in one go, but also perform spectral-similarity searches, and even help identify peculiar celestial objects. These capabilities are particularly critical in the field of galactic archaeology, holding the promise of sifting through massive datasets efficiently to find extremely rare, metal-poor ancient stars, which would provide critical clues for the study of the early formation and merger history of the Milky Way.
Thanks to its powerful unified data representation capability, SpecCLIP has been applied in multiple cutting-edge scientific exploration missions. On the "Earth 2.0 (ET)" mission for detecting Earth-like planets, for example, it can accurately characterize planet-hosting stars, thereby improving the efficiency of screening for potentially habitable planets.
The ongoing deluge of observational data from LAMOST, Gaia and next-generation sky surveys is driving a transition in astronomy from the era of single-task models to the era of foundation models. Researchers highlight that SpecCLIP demonstrates the vast potential of AI in astronomical spectroscopy. The model is poised to bridge different observation systems, thereby propelling research on stellar physics as well as the Milky Way's structure, formation and evolution into a new phase.Source: Science and Technology Daily and the Official Website of the University of Chinese Academy of Sciences





