Lingshu-Cell: A Generative Cellular World Model for Transcriptome Modeling Toward Virtual Cells


DAMO Academy, Alibaba Group
Equal Contribution, Project Leader

We’re excited to share Lingshu-Cell, a cellular world model released by Alibaba DAMO Academy. This work moves beyond static representation learning toward generative modeling of cellular state distributions and perturbation responses, taking a step toward virtual cells.

Highlights of Lingshu-Cell

  • Lingshu-Cell introduces a generative cellular world model for single-cell transcriptomics based on a masked discrete diffusion framework.
  • Lingshu-Cell performs transcriptome-wide modeling over approximately 18,000 genes directly in a discrete token space compatible with the sparse, non-sequential nature of scRNA-seq data, without prior gene selection.
  • Across diverse tissues and species, Lingshu-Cell faithfully captures cellular state distributions, marker-gene expression patterns, and cell-subtype proportions.
  • Lingshu-Cell also delivers leading performance in predicting cellular responses under both genetic and cytokine perturbations.
Figure 1: Overview of Lingshu-Cell

Figure 1 | Overview of the Lingshu-Cell framework. a, Lingshu-Cell employs a masked discrete diffusion model to learn and generate single-cell transcriptomic data. In the forward process, gene expression values are progressively masked; in the reverse process, the model iteratively predicts masked values to generate realistic scRNA-seq expression profiles. b, Comparison of generative paradigms. Unlike autoregressive (AR) models that rely on a fixed sequential order and denoising diffusion probabilistic models (DDPMs) that corrupt all positions with continuous noise, Lingshu-Cell randomly masks and predicts gene expression values in an order-independent manner, inherently compatible with the orderless structure of gene expression data. c, Application scenarios, including unconditional generation across diverse human tissues and species, and conditional generation for genetic perturbation and cytokine perturbation response prediction.

Abstract

Modeling cellular states and predicting their responses to perturbations are central challenges in computational biology and the development of virtual cells. Existing foundation models for single-cell transcriptomics provide powerful static representations, but they do not explicitly model the distribution of cellular states for generative simulation. Here, we introduce Lingshu-Cell, a masked discrete diffusion model that learns transcriptomic state distributions and supports conditional simulation under perturbation. By operating directly in a discrete token space that is compatible with the sparse, non-sequential nature of single-cell transcriptomic data, Lingshu-Cell captures complex transcriptome-wide expression dependencies across approximately 18,000 genes without relying on prior gene selection, such as filtering by high variability or ranking by expression level. Across diverse tissues and species, Lingshu-Cell accurately reproduces transcriptomic distributions, marker-gene expression patterns and cell-subtype proportions, demonstrating its ability to capture complex cellular heterogeneity. Moreover, by jointly embedding cell type or donor identity with perturbation, Lingshu-Cell can predict whole-transcriptome expression changes for novel combinations of identity and perturbation. It achieves leading performance on the Virtual Cell Challenge H1 genetic perturbation benchmark and in predicting cytokine-induced responses in human PBMCs. Together, these results establish Lingshu-Cell as a flexible cellular world model for in silico simulation of cell states and perturbation responses, laying the foundation for a new paradigm in biological discovery and perturbation screening.

Results

Cellular State Modeling

Figure 2: Unconditional generation results

Figure 2 | Unconditional generation of cell states across diverse species and tissues by Lingshu-Cell. a, UMAP visualization of real and generated cells (10,000 each, randomly sampled) from the PARSE-PBMC dataset, colored by cell type annotation and normalized expression (log1p) of canonical marker genes for each cell type. b, Comparison of cell type proportions between real and generated data. c, Quantitative benchmark comparing Lingshu-Cell, scDiffusion and scVI across five metrics (Pearson correlation, Spearman correlation, MMD, 1-WD and iLISI) on the PARSE-PBMC dataset. d, Unconditional generation results across human tissues, including neocortex, heart, lung and colon, with UMAP plots showing real (top) and generated (bottom) cells colored by cell type. e, Unconditional generation results across multiple species, including mouse, rhesus macaque, zebrafish and fly.

Extended Data Figure 1: Scaling to 200,000 cells

Extended Data Figure 1 | Scaling unconditional generation to larger cell populations. a, UMAP visualization of real and generated cells (200,000 each) from the PARSE 10M PBMC dataset, colored by cell type annotation (left) and normalized expression (log1p) of canonical marker genes. b, Comparison of major cell type proportions between real and generated data. c, UMAP visualization at higher resolution, showing that generated cells accurately recapitulate cell subtype structure of real data. d, Comparison of cell subtype proportions between real and generated data, demonstrating robust consistency at higher resolution.

Genetic Perturbation Prediction

Figure 3: Genetic perturbation prediction

Figure 3 | Accurate prediction of single-cell transcriptomic responses to genetic perturbations in cell lines by Lingshu-Cell. a, Schematic of CRISPR-based genetic perturbation and the resulting transcriptomic changes. b, Conditional generation framework for perturbation prediction. Cell type and perturbation target are provided as conditioning inputs, and a masked diffusion model iteratively predicts gene expression values to generate perturbation-specific expression profiles. c, Three design components of Lingshu-Cell: classifier-free guidance (CFG), sequence compression, and biological prior injection (see Methods). d, Ablation study of CFG guidance weight. Bar plots show prediction performance across eight metrics (DES, PDS, MAE, Spearman #DEG, Spearman LFC, AUPRC, Pearson-Δ, and average score) on the H1 test set (n = 100 perturbation targets). e, Ablation study of sequence compression, comparing uncompressed input with patch sizes of 8 and 32. f, Ablation study of biological prior injection, comparing prediction performance with and without prior injection. In d–f, metrics highlighted in red denote improved performance relative to the corresponding baseline in each ablation setting.

Table 2 | Genetic perturbation prediction on the VCC benchmark. Teams are ordered by final ranking. Avg Rank: average rank across the top 25 teams. Best per column in bold. See the arXiv submission for the full top-25 ranking.

Team Avg Rank ↓ DES ↑ PDS ↑ MAE ↓ Sp. #DEG ↑ Sp. LFC ↑ AUPRC ↑ Pearson-Δ ↑
Lingshu-Cell 8.7 0.216 0.748 0.052 0.394 0.331 0.272 0.306
cleopatra 9.1 0.228 0.747 0.086 0.473 0.396 0.266 0.203
xBio 10.7 0.305 0.811 0.770 0.564 0.087 0.252 0.217
Cellock Holmes 10.9 0.356 0.679 0.239 0 0.238 0.576 0.125
Shippers 11.0 0.354 0.699 0.231 0 0.227 0.576 0.123
Mean Predictors 11.4 0.305 0.741 6.723 0.294 0.213 0.582 0.217

Cytokine Perturbation Prediction

Figure 4: Cytokine perturbation prediction

Figure 4 | Accurate prediction of single-cell transcriptomic responses to cytokine perturbations in PBMCs by Lingshu-Cell. a, Schematic of cytokine-induced transcriptomic perturbation. b, Conditional generation framework for cytokine perturbation prediction. Donor identity and cytokine condition are provided as conditioning inputs, and a masked diffusion process iteratively predicts gene expression values to generate perturbation-specific expression profiles. c, Prediction performance of Lingshu-Cell, PertMean, STATE, scGPT, and scVI on the PARSE 10M PBMC dataset, evaluated across eight metrics (as defined in Fig. 3d). Bar plots show performance on a test set comprising 4 of 12 donors, with 70% of cytokine conditions (63 of 90) held out for each donor. Bars indicate mean performance across donors; error bars, interquartile range (Q1–Q3); points, individual donor values. Metrics for which Lingshu-Cell achieves the best performance among all methods are highlighted in red.

BibTeX

If you find our work useful, please consider citing:

@article{zhang2026lingshucell,
  title={Lingshu-Cell: A Generative Cellular World Model for Transcriptome Modeling Toward Virtual Cells},
  author={Zhang, Han and Yuan, Guo-Hua and Yuan, Chaohao and Xu, Tingyang and Bian, Tian and Cheng, Hong and Huang, Wenbing and Zhao, Deli and Rong, Yu},
  journal={arXiv preprint arXiv:2603.25240},
  year={2026}
}