Skip to main content

Spotlight on Shaojun Yu

shaojunyu

 

Shaojun Yu, PhD
Post-Doctoral Researcher, Yue Lab

 

How did you first hear about the Institute for Artificial Intelligence in Medicine (I.AIM) and Dr. Feng Yue, who directs I.AIM’s Center for Advanced Molecular Analysis?
When I was pursuing my PhD at Emory, Dr. Feng Yue gave a talk at the Department of Human Genetics. His presentation was incredibly inspiring, particularly in the areas of genomics and epigenomics research. After attending his talk, I reached out to Dr. Yue to explore the possibility of working with him as a postdoctoral researcher. During my search for more information, I learned about I.AIM’s Center for Advanced Molecular Analysis and its focus on integrating AI with molecular biology. I was particularly impressed by the impactful AI-driven tools developed by Dr. Yue’s lab, such as HiCPlus, NeoLoopFinder, and EagleC, which have significantly advanced epigenetics research.

What factors interested you in working with I.AIM?
My previous research focused on developing machine learning and deep learning models using publicly available biological datasets, such as those from ADNI and GEO. While these models demonstrated strong performance on benchmark datasets, I was always concerned about their generalization ability in real-world applications. I also wondered whether these models were truly capturing biologically meaningful patterns.

At I.AIM, I now have access to state-of-the-art resources for both high-performance computing and wet-lab experiments. This interdisciplinary environment allows me to collaborate with colleagues in both computational and experimental biology. Here, we can design wet-lab experiments to generate new data that validate deep learning model predictions. This has been a transformative experience for me—it’s the first time I’ve truly felt confident that my models are making meaningful contributions to biology.

How has your experience at I.AIM changed you?
Over the past few months, my experience at I.AIM has gradually shifted my perspective. I used to approach research from a purely computational standpoint, but now I find myself thinking more deeply about the underlying biological mechanisms behind computational results. Understanding more about biology has also helped me design more effective and biologically meaningful models. This interdisciplinary growth has been invaluable, and I now see myself as a computational biologist rather than just a computational researcher.

What are your plans for the future?
I have always been fascinated by the latest advancements in AI and how they can be applied to genomics research. One area that particularly excites me is the application of large language models (LLMs) in genomics. DNA, in a way, is its own "language of life," and I believe that LLMs could play a crucial role in uncovering hidden patterns in genomic sequences. Moving forward, I plan to continue my research in genomics and epigenomics, integrating cutting-edge AI technologies to push the boundaries of our understanding in these fields.

What projects are you currently working on or interested in?
I am currently working on developing a foundational epigenetics model aimed at elucidating the complex regulatory relationships between epigenetic signals—such as histone modifications and DNA methylation—and gene expression. Recent advancements have demonstrated the potential of large AI models in genomics. By leveraging these technologies, I hope to build a model that can provide deeper insights into gene regulation and epigenetic mechanisms.

Would you like to share a publication you are most proud of being associated with?
One of my recent publications, which was published a few months ago, is titled: A Novel Classification Framework for Genome-Wide Association Study of Whole Brain MRI Images Using Deep Learning. Unlike traditional genome-wide association studies (GWAS) that rely on univariate analyses of summarized imaging features, our work employs a deep learning classification framework to analyze MRI images based on single nucleotide polymorphism (SNP) genotypes. Using simulations and real-world data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), we demonstrated that this approach could identify novel genetic variants associated with brain phenotypes. While this study serves as a proof of concept, it highlights the potential of deep learning in uncovering complex, nonlinear relationships between genetics and brain structure—offering an alternative to traditional GWAS methods.

What is a cause that you are passionate about?
I am deeply passionate about making AI-driven biological research more reproducible and interpretable. AI has the power to revolutionize genomics, but without rigorous validation and careful interpretation, models can lead to misleading conclusions. I advocate for interdisciplinary collaboration between AI researchers and biologists to ensure that AI models truly benefit biomedical research.

What have you learned (or are learning) that has made a difference for you?
One of the most valuable lessons I have learned is the importance of experimental validation in computational research. It’s easy to get excited about a model with high accuracy, but real-world biological applications require validation through independent experiments. This shift in mindset—from focusing solely on computational metrics to prioritizing biological relevance—has significantly shaped my research approach.

What has been your greatest challenge?
One of my biggest challenges has been bridging the gap between AI and biology. Coming from a computational background, I initially found it difficult to understand the complexities of molecular biology. Learning to communicate effectively with biologists and designing AI models that align with real-world biological constraints has been a steep but rewarding learning curve.

What advice would you give to a student interested in getting into this field of study?
I would advise students to develop both strong computational skills and a solid understanding of biology. AI-driven genomics is an interdisciplinary field, and success requires knowledge in both domains. Additionally, I recommend engaging in hands-on projects—whether through coding, analyzing real biological datasets, or collaborating with biologists—to gain practical experience.