Related Content
About
With technological advancements, scientific data is being mined with greater dimensions and on a larger scale, and scientists urgently need new methods to navigate the data flood. Foundation models, overcoming the efficiency and accuracy challenges faced by traditional computing in handling complex conditions and massive datasets, have become a new tool in scientific research. During the 2024 China National Computer Congress (CNCC2024), experts and scholars from universities, research institutes, and enterprises gathered at the second Forum on Foundation Models and Scientific Computing. Focusing on the theme of foundation models and scientific computing, they provided detailed insights into the key applications and major advancements of foundation models across multiple natural science fields, such as geoscience, biology, and physics, as well as the challenges currently faced by scientific computing in data processing and computational power development.
Professor CHEN Lei of Hong Kong University of Science and Technology (Guangzhou) pointed out in his report "Data Science for Large Models" that the success of foundation models today depends on proper data management. If the data powering foundation models lacks validation and interpretation, the applicability of these models in many fields will be significantly limited. Therefore, addressing the three key issues of data preparation, training optimization, and model interpretation in foundation model data science will be the future research focus of this field.
CHEN Hongyang, Chairman of the Forum and researcher at Zhejiang Lab, shared the report titled "GeoGPT: A Large Language Model System for Geoscientists". GeoGPT is an open-source, non-profit exploratory project dedicated to global geoscience research in response to the international big science program, "Deep-time Digital Earth". It is a global open science endeavor involving research institutions, universities, industry, and numerous other organizations. With its powerful capabilities in the literature review, information extraction, geological map interpretation and generation, knowledge graph construction, and scientific hypothesis generation, GeoGPT is driving innovation and transformation in geoscience research models.
Professor CHEN Huajun from Zhejiang University summarized the development trends in artificial intelligence from the perspectives of knowledge graphs and large language models. He discussed methods and approaches for using knowledge graphs and language models to represent scientific knowledge and process scientific language. He also provided an in-depth exploration of the latest research advancements, including the chemical element-oriented knowledge graph, molecular graph learning with integrated knowledge enhancement and functional prompts, and protein prompt-based learning models.
"Quantum many-body systems have enormous degrees of freedom and massive data scales, which often prevent us from conducting efficient computational analysis," pointed out ZHANG Yi, an assistant professor at Peking University, in his report titled "Machine Learning in Quantum Materials, Models, and Algorithms". To address this, ZHANG introduced his team's computational strategy for quantum many-body systems using machine learning, including the application of machine learning to analyze quantum states based on complex high-throughput experimental or computational data, such as exotic evolving charge orders and quantum spin liquids. He also noted that directly applying foundation models to quantum models is expected to foster new developments in big data scientific analysis.
"AI for science has achieved great success, but what are its shortcomings?" "Is the model more important, or is computational power?" "What are the practical dilemmas in managing foundation model data?" ... During the roundtable forum, experts and attendees shared their views and engaged in discussions on these issues. "This year's Nobel Prize has demonstrated the initial success of AI for science, but large language models often overlook important details. For example, AlphaFold primarily focuses on the main chain, while giving less attention to the branch chain, which is crucial in pharmacology." "Both models and computational power are important, but the key is to strike the right balance between the two. Additionally, we need to make innovations in mechanisms to help universities address the lack of intelligent computational power." "Data security and privacy protection are crucial for effective data management in foundation models. We hope to achieve the open sharing of public welfare and industrial data under the guidance of the government, break down data barriers, and resolve the problem of data fragmentation." At the forum, attendees unanimously agreed that artificial intelligence technology, represented by foundation models, has profoundly transformed the paradigm of scientific computing. While it faces major challenges in computational power, models, and data, it also represents a new opportunity to take the lead in science and technology. All sectors of enterprises, universities, and research institutes should continue to explore and cooperate in this field, contributing wisdom and strength to both scientific research and industrial applications.