Machine learning model of specificity anchors and clustered binding sites to unravel DNA binding of proteins and drugs
Implementing Organization
Indian Institute of Technology (IIT)
Principal Investigator
Dr. Devesh Bhimsaria
Dr. Aditya Singh, Indian Institute Of Technology (IIT) Roorkee, Uttarakhand
Project Overview
Proteins and DNA-binding small molecule drugs interact with DNA through various types of contacts, which are essential for regulating gene transcription and controlling cellular processes. DNA binding proteins, mainly transcription factors (TFs), have a unique DNA binding domain (DBD) that recognizes a specific DNA sequence or motif. These nucleotides are called "specificity anchors" and their binding affinity varies based on the properties of neighboring DNA, such as shape and flexibility. A single model, mostly position weight matrix (PWM), is used to capture both binding specificity and affinity, but it is ineffective in distinguishing individual contributions and predicting their effect on binding. The project aims to develop a deep neural network-based machine learning model that correlates DNA binding to specificity anchors and DNA properties for improved predictions. For many TFs and sequence-specific drugs, a single binding site is not enough to trigger binding, and a cluster of binding sites is required. The project aims to create a clustered binding site model using linear regression and neural networks, modeling the binding of TFs and small molecule drugs based on their preference for a single or clustered binding site pattern. This distance-based model will help study diseases caused by repeat expansion and targeted therapeutics. An improved model of binding affinity and clustered binding sites will help predict DNA binding inside cells, improve related genetic networks, find associations between TFs to genetic diseases caused by mutations in non-coding regions, and design DNA binding drugs.