Genetics and Machine Learning

We are looking for PhD students and research interns (see openings) !

Last updated: 8/15/2025

We design statistically principled methods, develop user-friendly software, and study the genetic basis of human diseases. We currently focus on integrative analysis of genetics and functional genomics data. Topics of interest include:

  • Identifying disease-critical cellular contexts through integrating GWAS with scRNA-seq, scATAC-seq, and spatial data (See Zhang et al. 2022 Nat Genet, Yasumizu et al. 2024 Cell Rep).

  • Identifying genes and cell types at GWAS loci through integrating regulatory functional data (See Strober et al. 2025 Nat Genet); understanding disease-critical cell states and genes by analyzing case-control scRNA-seq data with causal inference.

  • Graph foundation models for jointly modeling variants, cis-regulatory elements, genes, and cell types for disease (See Huang et al. 2024 in revision at Nat Genet).

  • Understanding the genetic architecture of human diseases and the underlying evolutionary driving forces through analyzing biobank-scale genetics data and functional data (See Zhang et al. 2023 in revision at Nat Genet).

  • LLMs as agents for genetic modeling and causal discovery.

We also develop general statistical and machine learning algorithms motivated by applications in genetics; topics include multiple hypotheses testing, multi-armed bandits, dimensionality reduction, empirical Bayes, and causal inference.

Dr. Martin Jinye Zhang

  • (2019) Ph.D. EE, Stanford
  • (2014) B.Eng. EE, Tsinghua

News

8/2/2025

DALE-Eval preprint on comprehensive benchmarking of cell type-specific expression deconvolution method.

1/2/2025

TGFM for fine-mapping causal tissues and genes at disease-associated loci published at Nature Genetics.

12/5/2024

KGWAS preprint on integrating large-scale genomic knowledge graph to improve small-cohort GWAS. (SCS news)

... see all News