Wanwen Zeng

wanwen.jpg

I am currently a postdoctoral fellow in the Department of Statistics at Stanford University, working under the mentorship of Prof. Wing Hung Wong. My research focuses on developing machine learning methods to uncover gene regulatory mechanisms and their implications for complex traits and diseases by integrating multi-omics and whole-genome sequencing (WGS) data. Specifically, I am interested in:

  • Deciphering gene regulatory mechanisms, with an emphasis on non-coding regulatory elements, their interactions, and their roles in gene expression and disease mechanisms by leveraging large-scale (epi)genomic datasets from ENCODE and ROADMAP.
  • Understanding individual-level gene regulation, particularly focusing on modeling personal gene expression by developing genetic large language models (LLMs) with the use of admixture (e.g., African-American) WGS data in GTEx project.
  • Advancing disease risk assessment by integrating both rare and de novo variants in biobank-level WGS data from UK Biobank and MVP to improve predictive performance and biological interpretability.

news

Oct 03, 2024 Wing and I gave a invited talk at Stanford Biostatictics Seminar.
Oct 02, 2024 Our Cis-regulatory Element prediciton model is on bioRxiv.
Oct 02, 2024 Our Alzheimer’s disease prediction model using genomic LLMs is on medRxiv.
Oct 02, 2024 Our Polygenic Risk Score model using LLMs is on medRxiv.
Aug 18, 2023 Attended Conference in Celebration of Prof.Wing Hung Wong's 70th Birthday
Jan 06, 2023 Our study on HiChIP database is published at Nucleic Acids Research.
Dec 02, 2021 Our team won the first place in NeurIPS 2021 Multimodal Single-Cell Data Integration competition two Joint Embedding tasks.

selected publications

  1. PNAS
    How to improve polygenic prediction from increasingly prevalent whole-genome sequencing data?
    Wanwen Zeng, Hanmin Guo, Qiao Liu, and Wing H. Wong
    Proceedings of the National Academy of Sciences, 2024
    under review
  2. Nat. Aging
    Associating genotype to imaging and clinical phenotypes of Alzheimer’s disease by leveraging genomic large language model
    Qiao Liu*Wanwen Zeng*, Hongtu Zhu, Lexin Li, and Wing H. Wong
    Nature Aging, 2024
    under review
  3. Nat. Commun.
    CREATE: cell-type-specific cis-regulatory elements identification via discrete embedding
    Xuejian Cui, Qijin Yin, Zijing Gao, Zhen Li, Xiaoyang Chen, Shengquan Chen, Qiao Liu, Wanwen Zeng, and Rui Jiang
    Nature Communications, 2024
    in revision
  4. RECOMB
    bpBERT: base-resolution BERT models reveal DNA sequence regulatory syntax and variants
    Wanwen Zeng, Shuang Chen, Yuti Liu, and Wing H. Wong
    2024
    Under review
  5. NAR
    HiChIPdb: A database of HiChIP regulatory interactions
    Wanwen Zeng*, Qiao Liu*, Qijin Yin*, Rui Jiang, and Wing H. Wong
    Nucleic Acids Research, 2022
  6. Nat. Mach. Intell.
    Reusability report: Compressing regulatory networks to vectors for interpreting gene expression and genetic variants
    Wanwen Zeng*, Jingxue Xin*, Rui Jiang, and Yong Wang
    Nature Machine Intelligence, 2021
  7. NAR
    SilencerDB: A comprehensive database of silencers
    Wanwen Zeng*, Shengquan Chen*, Xuejian Cui*, Xiaoyang Chen, Zijing Gao, and Rui Jiang
    Nucleic Acids Research, 2021
  8. Bioinformatics
    Integrating distal and proximal information to predict gene expression via a densely connected convolutional neural network
    Wanwen Zeng, Yong Wang, and Rui Jiang
    Bioinformatics, 2020
  9. Nat. Commun.
    DC3: A method for deconvolution and coupled clustering from bulk and single-cell genomics data
    Wanwen Zeng, Xi Chen, Zhana Duren, Yong Wang, Rui Jiang, and Wing Hung Wong
    Nature Communications, 2019