Predicting drug–target interactions (DTI) is critical to drug discovery. Although recent DTI systems integrate representations from diverse drug and protein encoders, they often fail to capture fine-grained binding—i.e., which specific drug atoms (or substructures) interact with key protein residues—knowledge that is essential for understanding binding mechanisms and optimising design.
We present FusionDTI, a token-level fusion approach that learns fine-grained drug–protein interactions. FusionDTI adopts SELFIES to ensure valid molecular sequences and employs a structure-aware (SA) vocabulary for proteins, enriching residue tokens with 3D-informed categories. Frozen, large-scale biomedical PLMs (e.g., SELFormer for ligands and SaProt for proteins) provide robust token embeddings, which are fused via either a Bilinear Attention Network (BAN) or a Cross-Attention Network (CAN) to yield interaction-sensitive features. Across three standard benchmarks, FusionDTI consistently outperforms seven state-of-the-art baselines. A case study further shows that FusionDTI highlights plausible binding sites, improving explainability of the DTI predictions.
Explore atom–residue interactions with our token-level fusion model.
Input. Drugs and proteins are provided as sequences. Proteins use an SA vocabulary (441 categories) that associates each residue with a 3D-informed class, supplementing amino-acid sequences with structural cues. Drugs use SELFIES, a robust syntax that always yields valid molecular graphs. Steps to obtain SA and SELFIES are provided in the code repository.
Encoders. FusionDTI uses frozen encoders—SELFormer (ligands) and SaProt (proteins)—to produce token embeddings. Encoders can be swapped for alternative PLMs if desired.
Fusion module. We study two choices: BAN and CAN. CAN computes cross-attention between ligand and protein tokens and concatenates the attended features into a fused representation F. BAN constructs a bilinear attention map and applies bilinear pooling to obtain F.
Prediction head. A multilayer perceptron yields the interaction probability p, trained using binary cross-entropy.
To capture token-level binding between drugs and proteins, FusionDTI fuses token embeddings with either a Bilinear Attention Network or a Cross-Attention Network. Both produce interaction-aware features while preserving fine-grained correspondences useful for downstream explanation.
@inproceedings{meng2024fusiondti,
title = {FusionDTI: Fine-grained Binding Discovery with Token-level Fusion for Drug–Target Interaction},
author = {Meng, Zhaohan and Meng, Zaiqiao and Yuan, Ke and Ounis, Iadh},
booktitle = {Findings of EMNLP 2025},
year = {2025},
url = {https://arxiv.org/abs/2406.01651}
}
This website template is adapted from the MiniGPT-4 project, which is adapted from Nerfies, and is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.