Megha Hegde

Research project: Deciphering Multi-omics Data: Large Language Models in Disease Research

Abstract

This project uses artificial intelligence (AI) to support disease-related research by designing novel large language models (LLMs) that can decipher multi-omics data.The project aims to go beyond current Transformer models by developing a novel unsupervised pre-training paradigm, which can better model the underlying "language" of multi-omics data. Additionally, as the attention mechanism has proven to be a bottleneck for genomic modelling due to its quadratic complexity, we explore alternatives that can provide excellent performance with a lower computational cost. A key area of focus is modelling long-range dependencies within the data, which is crucial in identifying patterns (for instance, drug-resistant mutation profiles) that are currently out of reach for state-of-the-art models.This research has potential applications to a wide range of communicable diseases caused by pathogens, including HIV, tuberculosis bacteria, MRSA, and hepatitis B virus, which have a significant burden on human health worldwide and have become resistant to many drugs used to treat them. The developed methods can also support cancer research by integrating various data types and sources, particularly in developing patient-specific treatment regimens.

Biography

I am a PhD student in the School of Computer Science and Mathematics, exploring the applications of Large Language Models and deep learning to multi-omics data to aid disease-related research.

I first came to Kingston University in 2022 to complete my MSc in Data Science. Alongside my MSc, I worked as a research assistant for Dr Farzana Rahman. Prior to this, I studied MEng Engineering Science at the University of Oxford, and then worked as a software developer in the asset management industry. During my academic career so far, I have worked on research projects leveraging AI for various applications, including foetal gestational age estimation, retinal disease detection, and air quality prediction.

Areas of research interest

  • Large language models
  • Multi-omics
  • Bioinformatics
  • Deep learning
  • Artificial intelligence

Qualifications

  • MSc Data Science, Kingston University London
  • MEng Engineering Science, University of Oxford

Funding or awards received

  • PhD studentship, Graduate Research School, Kingston University London

Publications

Conference papers

Hegde, M, Nebel, J-C. and Rahman, F. (2024) ‘Sustainable AI-based prediction of air pollution levels in London  ', Proceedings of the 9th World Congress on Civil, Structural, and Environmental Engineering (CSEE 2024) .