FusionDTI: Fine-grained Binding Discovery with Token-level Fusion for Drug-Target Interaction

University of Glasgow
* Corresponding author

Abstract

Predicting drug-target interaction (DTI) is critical in the drug discovery process. Despite remarkable advances in recent DTI models through the integration of representations from diverse drug and target encoders, such models often struggle to capture the fine-grained interactions between drugs and protein, i.e. the binding of specific drug atoms (or substructures) and key amino acids of proteins, which is crucial for understanding the binding mechanisms and optimising drug design.

To address this issue, this paper introduces a novel model, called FusionDTI, which uses a token-level Fusion module to effectively learn fine-grained information for Drug-Target Interaction. In particular, our FusionDTI model uses the SELFIES representation of drugs to mitigate sequence fragment invalidation and incorporates the structure-aware (SA) vocabulary of target proteins to address the limitation of amino acid sequences in structural information, additionally leveraging pre-trained language models extensively trained on large-scale biomedical datasets as encoders to capture the complex information of drugs and targets. Experiments on three well-known benchmark datasets show that our proposed FusionDTI model achieves the best performance in DTI prediction compared with seven existing state-of-the-art baselines. Furthermore, our case study indicates that FusionDTI could highlight the potential binding sites, enhancing the explainability of the DTI prediction.

Overall Framework

Input: The initial inputs of drugs and targets are string-based representations. For protein, the SA vocabulary is employed, where each residue is replaced by one of 441 SA vocabularies that bind an amino acid to a 3D geometric feature to address the lack of structural information in the amino acid sequences. For drug, as mentioned in the previous section, we use the SELFIES, which is a formal syntax that always generates valid molecular graphs. We provide the steps for obtaining SA and SELFIES sequences in Code.

Encoder: The proposed model contains two frozen encoders: Saport and SELFormer, which generate a drug representation and a protein representation separately. It is of note that FusionDTI is flexible enough to easily replace encoders with other advanced PLMs. Furthermore, they are stored in memory for later-stage online training.

Fusion module: In developing FusionDTI, we have investigated two options for the fusion module: BAN and CAN to fuse representations. The CAN is utilised to fuse each pair and then concatenate them into one F for fine-grained binding information. For BAN, we need to obtain the bilinear attention map and then generate F through the bilinear pooling layer.

Prediction head: Finally, we obtain the p of the DTI prediction by a multilayer perceptron (MLP) classifier trained with the binary cross-entropy loss.

Proposed Model

Fusion Module

In order to capture the fine-grained binding information between a drug and a target, our FusionDTI model applies a fusion module to learn token-level interactions between the token representations of drugs and targets encoded by their respective encoders. Two fusion modules inspired by the recent literature are investigated to fuse representations: the Bilinear Attention Network and the Cross Attention Network.

Fusion Module

Experimental Results

In-Domain

In-Domain Results

Cross-Domain

Cross-Domain Results

BibTeX


@inproceedings{meng2024fusiondti,
title={Fusion{DTI}: Fine-grained Binding Discovery with Token-level Fusion for Drug-Target Interaction},
author={Zhaohan Meng and Zaiqiao Meng and Iadh Ounis},
booktitle={ICML 2024 AI for Science Workshop},
year={2024},
url={https://openreview.net/forum?id=SRdvBPDdXB}
}
    

Acknowledgement

This website template is adapted from the MiniGPT-4 project, which is adapted from Nerfies, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.