Research
Deciphering gene regulatory mechanisms
Understanding gene regulatory mechanisms is crucial for uncovering the molecular basis of gene expression and its dysregulation in disease. Non-coding regulatory elements, such as enhancers and silencers, play key roles in controlling when and where genes are expressed, and mutations in these elements can lead to complex traits and diseases. I develop methods to decipher these mechanisms, focusing on non-coding regulatory elements, their interactions, and their impact on gene expression and disease mechanisms. By leveraging large-scale genomic and epigenomic datasets, such as those from ENCODE and ROADMAP, I systematically identify regulatory elements, including enhancers and silencers, and explore how genetic variants within these elements influence regulatory networks. I build comprehensive models of both bulk and singlecell level cell-type-specific regulatory interactions and gene expression by integrating both cis- and trans-regulatory networks. My work aims to bridge the gap between genetic variation and disease progression, providing a more nuanced understanding of how gene regulation is altered in disease states.
Data Type
: ATAC-seq, ChIP-seq, RNA-seq, HiChIP, Hi-C, scRNA-seq, scATAC-seq
Understanding individual-level gene regulation
Understanding individual gene regulation is key to bridging the gap between regulatory genomics and genetics, providing insight into how genetic variants influence gene expression at a personal level. I develop methods to model both personal gene expression and allele-specific expression by utilizing large language models (LLMs) trained on whole-genome sequencing (WGS) data, particularly from admixed populations such as African-Americans within the GTEx project. Admixed WGS data offer an ideal control, allowing the isolation of the effects of different alleles while controlling for other factors like transcription factor binding. By focusing on allele-specific regulation, my work provides a clearer understanding of how genetic variants impact regulatory mechanisms and contribute to disease, with a specific emphasis on individual variability.
Data Type
: ATCT-seq, ChIP-seq, RNA-seq, WGS
Advancing disease risk assessment
In personalized medicine, disease risk assement enables stratification of individuals based on their genetic risk, facilitating targeted interventions and optimizing clinical outcomes. Advancing disease risk assessment has been greatly enhanced by the unprecedented opportunities provided by WGS data, which allows for the comprehensive inclusion of both rare and de novo variants. I develop methods that leverage WGS data from large biobanks such as UK Biobank and MVP, utilizing LLMs to integrate both regional and regulatory information. These LLMs enable more precise modeling of how genetic variants contribute to disease risk, offering deeper biological insights. By incorporating rare and de novo variants, our approach significantly improves the predictive performance and biological interpretability of polygenic risk scores, further advancing our understanding of the genetic architecture of disease.
Data Type
: ATCT-seq, ChIP-seq, RNA-seq, WGS, phenotype (binary and continuous)