We are looking for PhD students and research interns (see openings) !
Last updated: 8/15/2025
We design statistically principled methods, develop user-friendly software, and study the genetic basis of human diseases. We currently focus on integrative analysis of genetics and functional genomics data. Topics of interest include:
Identifying disease-critical cellular contexts through integrating GWAS with scRNA-seq, scATAC-seq, and spatial data (See Zhang et al. 2022 Nat Genet, Yasumizu et al. 2024 Cell Rep).
Identifying genes and cell types at GWAS loci through integrating regulatory functional data (See Strober et al. 2025 Nat Genet); understanding disease-critical cell states and genes by analyzing case-control scRNA-seq data with causal inference.
Graph foundation models for jointly modeling variants, cis-regulatory elements, genes, and cell types for disease (See Huang et al. 2024 in revision at Nat Genet).
Understanding the genetic architecture of human diseases and the underlying evolutionary driving forces through analyzing biobank-scale genetics data and functional data (See Zhang et al. 2023 in revision at Nat Genet).
LLMs as agents for genetic modeling and causal discovery.
We also develop general statistical and machine learning algorithms motivated by applications in genetics; topics include multiple hypotheses testing, multi-armed bandits, dimensionality reduction, empirical Bayes, and causal inference.
DALE-Eval preprint on comprehensive benchmarking of cell type-specific expression deconvolution method.
1/2/2025TGFM for fine-mapping causal tissues and genes at disease-associated loci published at Nature Genetics.
12/5/2024KGWAS preprint on integrating large-scale genomic knowledge graph to improve small-cohort GWAS. (SCS news)