Recently, Zhejiang Lab (ZJ Lab) and BGI-Research jointly released Genos-10B, the latest version of Human-Centric genomic foundation model. The model not only scales up to 10 billion parameters, but also enables accurate modeling of ultra-long contexts of one million base pairs (1Mb) while achieving single-base resolution through Mixture-of-Experts (MoE) and Grouped-query Attention (GQA). This breakthrough provides a new technological paradigm for cracking open the "functional black box" of noncoding regions, which constitute over 90% of the human genome.
Additionally, building upon ZJ Lab's 021 Science Foundation Model and Genos, a multimodal fusion model 021-Genos has been developed, which breaks down boundaries between gene sequences and domain knowledge. This model leverages multimodal intelligence to significantly enhance diagnostic accuracy for genetic diseases.

Industry Context: "Understanding Gaps" in Massive Sequencing Data
Since the completion of the draft human genome, the exponential decline in sequencing costs has made it commonplace to acquire petabytes of whole-genome sequencing (WGS) data. However, computational bottlenecks remain significant.
Explanatory Bottleneck: Over 90% of Human Genome Without a Clearly Defined Biological Function
Over 98% of the human genome does not code directly for proteins. The regulatory logic and long-range interactions in these regions have long been regarded as genomic "dark matter" due to the lack of effective interpretation tools and methods. Traditional genomic AI models are mostly limited to short sequences (<10k bp) or specific species. The emergence of Genos indicates that large language models (LLMs) are now being adapted to capture ultra-long and high-dimensional characteristics of the genome.
Core Technology Breakdown: Building Human-Centric Foundation Models
In terms of architectural innovations sought by AI developers, Genos demonstrates several advancements in data engineering and algorithm design.
Data Foundation Innovation:
Unlike previous models that primarily relied on a single reference genome and low-quality draft genomes, Genos employs a hardcore corpus system: it includes high-quality corpora based on a complete set of 636 "telomere-to-telomere" (T2T) human genomes that incorporate high-precision assembly data from the Human Pangenome Reference Consortium (HPRC) and the Human Genome Structural Variation Consortium (HGSVC).
By incorporating long-read data covering diverse ethnic groups worldwide and Chinese cohorts, Genos effectively captures complex genetic diversity across populations, minimizing data bias at its source.
Architectural Innovation:
MoE enhances efficiency in large-scale training and reasoning. To address the super-high dimensionality and complexity of genomic data, Genos—a model optimized on the Transformer architecture—leverages a sparsely-activated MoE framework with expert load balancing and dynamic routing. This design significantly boosts inference energy efficiency while maintaining a knowledge capacity of 10 billion parameters.
Attention mechanisms achieve complementarity. Gene regulation in the human genome often spans millions of loci, posing challenges for attention computation in models due to the ultra-long sequences. To address this, GQA is integrated into Genos, where 16 attention heads share 8 sets of key-value pairs. This design balances computational efficiency with expressive capacity, enabling fast key-value caching for long sequences. Flash Attention provides underlying computational acceleration. Together, they support modeling of million-token contexts, allowing Genos to capture long-range regulatory interactions at the chromosome level.
Ultra-long context training optimization. Rotary Position Embedding (RoPE) is used to dynamically encode positional information into attention mechanisms, thereby avoiding the sequence length limitations imposed by explicit position embeddings. This design is combined with a 5D parallelism strategy (tensor parallelism, pipeline parallelism, context parallelism, data parallelism, and expert parallelism) to enable efficient training of ultra-long contexts.
Experimental Results: Dimensional Elevation from Component Identification to Clinical Reasoning
Across benchmarks, Genos shows strong performance:
Single-Base Precision Prediction:
Genos-10B achieved an 88.72% accuracy in identifying and evaluating functional genetic elements.
Mutation Effect Prediction:
Genos simulates mutation effects on RNA expression. Experimental results demonstrate that its predictions exhibit high correlation with real RNA-seq data, breaking barriers between sequencing and multi-omics prediction.
Clinical Omics Diagnosis:
In rare disease diagnosis, Genos demonstrates the ability to reason by integrating gene sequences with clinical textual phenotypes, achieving diagnostic accuracy close to that of senior clinical geneticists.
Project Implementation: Adaptation to Domestic Computing Infrastructure and Open-Source Ecosystem
Adhering to "Tech for All", the Genos team has implemented full-stack optimizations tailored for real-world R&D scenarios.
Full-Scale Open Sourcing:
Genos has been open-sourced, available in two versions - 1.2 billion (1.2B) and 10 billion (10B) parameters that cater to diverse application scenarios ranging from PCs to computing clusters.Deep
Adaptation to Domestic Hardware:
The Genos model deployed on domestic computing infrastructure has undergone optimization through the integration of the Very Large Language Model Inference Framework (vLLM). This significantly reduces barriers to deployment in heterogeneous computing environments.
Cloud-End Collaboration:
Genos has been deployed on BGI's DSC Cloud platform, offering RESTful API services that allow developers to directly call for DNA sequence embedding and extraction, as well as mutation effect prediction.
Ways to Get Open-Source Genos
GitHub:
https://github.com/zhejianglab/Genos
ModelScope:
https://modelscope.cn/collections/zhejianglab/Genos
Hugging Face:
https://huggingface.co/collections/ZhejiangLab/genos
zero2x:
https://www.zero2x.org/genos
Welcome to connect and engage with our R&D team on open-source communities like Github!
Online Experience Platform
https://cloud.stomics.tech/#/inferance-web?type=model
ModelScope Perspective: An Essential Step Towards "Programming Biology"
AI for Science has moved from conceptual discussion to competition in niche areas. The release of Genos represents not only a competition in terms of parameter quantity, but also a computing paradigm shift. It expands the modeling scope from traditionally local sequences to genome-wide sequences. With Genos, ZJ Lab and BGI-Research have seized the initiative in this biological AI competition.
For AI developers, Genos provides references for engineering practice in handling extremely long sequences and complex and dynamic routes. For biological researchers, it serves as a high-resolution microscope capable of peering into "genomic dark matter". As the Genos ecosystem becomes open-source, we'll witness an explosion of technological breakthroughs in elucidating life mechanisms on a genome-wide scale.
Paper
Genos Team. GigaScience (2025). Genos: A Human-Centric Genomic Foundation Model. (DOI: https://doi.org/10.1093/gigascience/giaf132). Available from https://github.com/zhejianglab/Genos





