Research News
Integrating Physical Knowledge and Data Augmentation for Protein-ligand Interaction Scoring
Editor: LIU Jia | Jun 07, 2024
Print
Understanding protein-ligand interactions is crucial for drug discovery, yet developing robust methods for evaluating protein-ligand interactions has been a long-standing problem. To develop a scoring method with higher accuracy in practical application scenarios remains an open challenge.
In a study published in Nature Machine Intelligence, a team led by ZHENG Mingyue from Shanghai Institute of Materia Medica of the Chinese Academy of Sciences, introduced a scoring approach called EquiScore. EquiScore demonstrated good predictive performance on unseen proteins in virtual screening (VS) scenario and analogs ranking scenario. When used alongside different docking methods, it effectively enhanced their screening ability. Simultaneously, EquiScore is capable of capturing key inter-molecular interactions, providing useful clues for rational drug design.
In this study, researchers constructed a new dataset called PDBscreen using multiple data augmentation strategies, such as enlarging the positive sample size with near-native ligand binding poses and the negative sample size with generated highly deceptive decoys to avoid common biases. Based on the PDBscreen dataset, they trained a model, EquiScore, using an equivariant heterogeneous graph architecture that incorporates different physical and prior knowledge about protein-ligand interaction.
In VS scenario, EquiScore outperformed 21 existing scoring methods on unseen proteins on two external datasets, DEKOIS2.0 and DUD-E. When considering only targets not seen during training, the performance of other deep learning-based models dropped significantly. In the analogs ranking scenario, EquiScore showed lower ranking ability than FEP+ among eight different methods. Considering the significantly higher computational expenses of FEP+ calculations, EquiScore demonstrated the advantage of more balanced speed and accuracy. Besides, EquiScore was proven to have robust rescoring capabilities when applied to poses generated by different docking methods. Rescoring with EquiScore can enhance the VS performance of all evaluated methods.
In the ablation experiment section, researchers found that all modules in EquiScore significantly contribute to overall performance, and any removal would lead to performance degradation. However, roles of data augmentation and model design differ significantly across application scenarios. In VS scenario, data augmentation methods notably enhanced enrichment capability with negative samples playing a major role. In the analogs ranking scenario, module contributions contrasted with those in VS scenario, while changes to the model architecture were more crucial than data augmentation.
To further disentangle the contributions of the dataset and model architecture to the performance, researchers trained models with other architectures on PDBscreen. Even with the same training dataset, EquiScore outperformed other models, underscoring the contribution of the model architecture.
By analyzing the model's interpretability, researchers found that EquiScore can capture key inter-molecular interactions, demonstrating its rationality and providing useful clues for rational drug design.
Robust prediction of protein-ligand interactions will help to understand the biology of proteins and to determine their impact on future drug treatments. This study showed that EquiScore may contribute to a greater understanding of human health and disease, and discovery of novel medicines.
Contact

JIANG Qingling

Shanghai Institute of Materia Medica

E-mail:

Related Articles