Research

Polygenic enrichment in disease-critical cellular contexts

Heritable human diseases and complex traits often manifest in a highly cell-type-specific manner across the body. Pinpointing these causal cell subpopulations are critical for functional follow-up and drug development. Integrating GWAS and functional genomics (e.g., scRNA-seq) has been demonstrated to be an effective approach. However, most existing works cannot capture the heterogeneity of disease association within classical cell types or distinguish causal cellular contexts from correlated (tagging) contexts. with similar molecular patterns. We develop methods to capture the fine-grained diseases-associated cell populations and to tease apart the causal cellular contexts. Single-cell disease relevance score paper (Zhang*, Hou* et al. 2022). Also see scDRS analysis of spatial transcriptomics (Yasumizu et al. 2024), immune cell states from single-cell multiome (Gupta et al. 2023).

Data types: GWAS (summary statistics); scRNA-seq; eQTL; single-cell eQTL; scATAC-seq; SHARE-seq; spatial transcriptomics.

Genetic architecture of human diseases and complex traits

Understanding the genetic architecture of diseases and complex traits, namely the distribution of causal SNP effects across the genome, is a fundamental task with profound implications. For example, it can shed light on human evolutionary history and inform downstream tasks such as association testing, fine-mapping, polygenic risk prediction, and disease gene identification. Most existing works focus on the marginal distribution of SNP causal effect sizes, and we expand this endeavor in several directions. First, we move beyond the marginal distribution and develop methods to examine the correlation of causal effect sizes of SNP pairs, which may arise due to interaction with natural selection. Second, we investigate the genetic architecture involving both traits and molecular phenotypes, such as mediated trait heritability by gene expression. We develop methods to assess the mediated heritability by gene expression or protein abundance and partition it across cell types. LDSPEC SNP-pair effect correlation paper (Zhang* et al. 2023). Also see ATM comorbidity disease subtypes paper (Jiang et al. 2023).

Data types: GWAS (summary statistics or individual genotype, e.g., UK Biobank); functional annotations (e.g., baseline-LD); eQTL (e.g., GTEx); single-cell eQTL; pQTL (e.g., UK Biobank).

Causal inference in genetics and genomics

Causal inference is a powerful statistical technique that can uncover the true underlying causal relationship between variables. However, its applications in genetics and genomics are often hindered by domain-specific structures such as a large number of variables, relatively smaller sample sizes, small effect sizes of genetic variants, complex model assumptions across molecular phenotypes, and different data collection methods. Therefore, methods need to be developed that carefully model these aspects to produce powerful and robust estimates. First, we develop integrative methods combining GWAS and molecular QTL data to estimate the causal relationship between genes/proteins and diseases. Second, we develop methods to disentangle causal from tagging relationships in gene expression experiments, such as differential expression analysis in RNA-Seq or scRNA-seq.

Data types: GWAS (summary statistics); eQTL (e.g., GTEx); single-cell eQTL; pQTL (e.g., UK Biobank); scRNA-seq; perturb-seq; massively parallel reporter assays (MPRAs).

Machine learning and data science

We develop general machine learning and data science algorithms motivated by applications in genetics. Fast random forest paper (Tiwari et al. 2022). Deployment monitoring for machine learning systems paper (Ginart et al. 2022). Single-cell aging score paper (Zhang et al. 2021). Fast k-medoid clustering paper (Tiwari et al. 2020). Covariate-adaptive multiple hypothesis testing paper (Zhang et al. 2019). Adaptive Monte Carlo multiple testing paper (Zhang et al. 2019). Contrastive principal component analysis paper (Abid*, Zhang* et al. 2018).

Research grants

Our research is supported by the following grants