ZHEJIANG LAB
News  Detail
SHAO Shuai: FSL Leverages Big Data Through Collaboration among Foundation Models
Date: 2024-08-22

For SHAO Shuai, a post-90 postdoctoral fellow at the Research Center for Frontier Fundamental Studies of Zhejiang Lab (ZJ Lab), going to the zoo is an exclusive childhood memory. His parents taught him to recognize animal characteristics. For instance, they told him that the golden monkey was one golden-haired monkey species, and then asked him to find it from a bunch of monkeys. "I'd never seen a golden monkey, but I recognized it quickly because I knew gold and monkeys."

SHAO Shuai

For humans, it is natural to correctly identify new concepts with only a small amount of samples or characterizations when facing the unknown. However, it is not easy for artificial intelligence that relies on vast amounts of data.

"At present, the success of AI deep learning models depends heavily on a large number of high-quality data resources and human data annotation tasks. However, in many real-world scenarios, there are only a few samples that can be used for analytical learning, such as medical diagnostic data on rare diseases, samples of endangered plants and animals in flora and fauna studies, ancient scripts from lost civilizations, and remote sensing images of rare landforms." How to enable model algorithms based only on "a small number of low-quality labeled samples" to achieve or come close to the results that can be achieved by using "a large number of high-quality labeled samples" is a key issue in the field of few-shot learning (FSL).

SHAO Shuai, who joined ZJ Lab in 2022, provides new solutions for few-shot sample-based computing research. Recently, he published his latest research result entitled Collaborative Consortium of Foundation Models for Open-World Few-Shot Learning at AAAI, a top international computer vision conference, aiming to harness the advantages and potential of existing foundation model resources to realize accurate classification of few-shot sample test data for downstream tasks.

SHAO Shuai believes that foundation models have been extensively validated and tuned, and the foundation model directly utilizing frozen parameters can save a lot of time and resources for computing research. He likens trained foundation models to "a codified encyclopedia", and likens the process of harnessing foundation models to address the FSL problem to "referring to the encyclopedia for information about FSL". "Whether to search slowly from the first page or start with catalog or full-text search, the computing efficiency varies greatly with different search methods." For this reason, SHAO Shuai proposed the Collaborative Consortium of Foundation Models (CO3) in his research, which leverages CLIP, GPT-3, DINO and DALL-E to collectively enable intelligent computing.

Specifically, SHAO Shuai designed four blocks for integration and connection between models. "First of all, few-shot sample data are input into the Label Correction Block (LC-Block), and a prototype structure is designed to denoise the data. Then, corrected data are fed into the Data Enhancement Block (DA-Block) to enhance the generalization performance of the models, so as to obtain richer training data. Then, cleaned original data and extended data are input into the Feature Extraction Block (FE-Block) to obtain text and image features. Finally, these features are fused and tuned with the specially engineered Text-guided Fusion Adapter (TeFu-Adapter) to further mitigate the impact of noisy labels on the models and enhance the robustness of the approach."

SHAO Shuai explained that CO3 has been subjected to a lot of experiments on several datasets, which is powerful evidence of CO3's feasibility, and its representation ability can be enhanced even in the presence of lower-quality data and fewer labelled samples available. For example, in the range of noise rate from 0.0 to 1.0, when there is just one labeled sample available per category, CO3 can reach over 62% stably in terms of accuracy of data representation, taking into account the balance between performance and cost efficiency.

"How to bridge the gap between FSL techniques and real-world applications is the challenge I aim to tackle next," said SHAO Shuai. With the support of ZJ Lab's intelligent computing infrastructure, he will further enhance the generalization of FSL algorithms and explore scenarios in response to real-world needs, so as to push the boundaries of intelligent computing.