Genomics and Healthcare:
Machine Learning for Genomics:
We propose ENBED, a novel foundation model that analyzes DNA sequences at byte-level precision with an encoder-decoder Transformer architecture. The Ensemble Nucleotide Byte-level Encoder-Decoder (ENBED) uses a sub-quadratic implementation of attention to develop an efficient model capable of sequence-to-sequence transformations, generalizing previous genomic models with encoder-only or decoder-only architectures. We pre-train the foundation model using reference genome sequences and find that it outperforms the existing state-of-the-art in 22 out of 25 genomic benchmark datasets. Leveraging this strength in sequence-level classification tasks, we show that the model can identify biological function annotations of genomic sequences. Additionally, we show that ENBED can identify sequences consisting of base call mismatches and insertion/deletion errors, an advantage over tokenization schemes involving multiple base pairs, which lose the ability to analyze with byte-level precision. The novel genomic encoder-decoder architecture allows us to perform sequence-to-sequence transformations. We use this ability to study the prediction of pathogen mutations in 16S sequences from E. Coli and to accurately generate child sequences with known mutations validated in the real-world population.
- Aditya Malusare, Harish Kothandaraman, Dipesh Tamboli, Nadia A. Lanman, and Vaneet Aggarwal, "Understanding the Natural Language of DNA using Encoder-Decoder Foundation Models with Byte-level Precision," Submitted.
Machine Learning for Sepsis Early Detection and Treatment:
Sepsis is a life-threatening medical emergency caused by your body’s overwhelming response to an infection. Without urgent treatment, it can lead to tissue damage, organ failure and death. Machine learning (ML) has been used to address the challenge of managing sepsis through sequential decision-making; however, these methods perform poorly in data-limited offline settings with survival rates falling below 50 percent. We propose a transformer-based decision maker, as well as integrate a mortality classifier as a reinforcement component to enhance the overall survival rate of patients.
- Dipesh Tamboli, Jiayu Chen, Kiran Pranesh Jotheeswaran, Denny Yu, and Vaneet Aggarwal, "Reinforced Sequential Decision-Making for Sepsis Treatment: The POSNEGDM Framework with Mortality Classifier and Transformer," Submitted.
- Naimahmed Nesaragi, Shivnarayan Patidar, and Vaneet Aggarwal, "Tensor Learning of Pointwise Mutual Information from EHR Data for Early Prediction of Sepsis," Computers in Biology and Medicine, Volume 134, July 2021, 104430.
Machine Learning for Health Risk Detection:
Using efficient learning based techniques are essential for predicting risk in individuals. In many cases, passive face videos can be used to predict the health risks. We have used learning based techniques for early prediction of Sepsis, prediction of lifting load risk, force exertions, and health monitoring.
- Guoyang Zhou, Vaneet Aggarwal, Ming Yun, and Denny Yu, "A Computer Vision Approach for Estimating Lifting Load Contributors to Injury Risk," IEEE Transactions on Human-Machine Systems, vol. 52, no. 2, pp. 207-219, April 2022, doi: 10.1109/THMS.2022.3148339.
- Guoyang Zhou, Vaneet Aggarwal, Ming Yun, and Denny Yu, "Video-Based AI Decision Support System for Lifting Risk Assessment," in Proc. IEEE SMC, Oct. 2021.
- Hamed Asadi, Guoyang Zhou, Jae Joong Lee, Vaneet Aggarwal, and Denny Yu, "A Computer Vision Algorithm to Identify High Force Exertions from Facial Expressions," Ergonomics, Apr 2020.
- Mayank Gupta, Lingjun Chen, Denny Yu, and Vaneet Aggarwal, "A Supervised Learning Approach for Robust Health Monitoring using Face Videos," in Proc. 2nd ACM Workshop on Device Free Human Sensing (DFHS, ACM Buildsys Workshop), Nov. 2020
DNA Based Data Storage:
DNA-based data storage systems have evolved as a solution to accommodate data explosion. In this work, some properties of DNA codewords that are essential for an archival DNA storage are considered for the design of codes. Constraint-based DNA codes, which avoid runs of nucleotides, have fixed GC-weight, and a specific minimum distance is presented. Further, we have provided a review on natural storage. We note that insertions and deletions are common errors in DNA storage, and efficient approaches to deal with such errors is also studied.
- Vaneet Aggarwal and Rakhi Pratihar, "Insdel codes from subspace and rank-metric codes," Discrete Mathematics, Volume 347, Issue 1, 113675, Jan 2024.
- Dixita Limbachiya, Manish K. Gupta, and Vaneet Aggarwal, "10 Years of Natural Data Storage," IEEE Transactions on Molecular, Biological and Multi-Scale Communications, vol. 8, no. 4, pp. 263-275, Dec. 2022
- Dixita Limbachia, Manish Gupta, and Vaneet Aggarwal, "Family of Constrained Codes for Archival DNA Data Storage," IEEE Communications Letters, vol. 22, no. 10, pp. 1972-1975, Oct. 2018.
Home