Googling the cancer genome
Identification and prioritization of cancer-causing structural variations in whole genomes
Interactions between biomolecules control all cellular processes. Understanding those interactions requires adding a three dimensional structural dimension. Next to experimental structural biology techniques, this can be done by docking, a complementary and high-throughput computational method allowing to model complexes from their known components.
A challenge in docking is scoring – the identification of correct (near-native) models from a large pool of docked models – due to our still limited knowledge of interaction rules. We will tackle this challenge by training deep networks (dNNs) to learn complex interaction patterns from the huge amount of experimental data in the Protein Data Bank (a valuable source of information not yet fully exploited). Our innovative strategy is to treat this problem as a 3D image classification problem: The interfaces of docked models will be represented as 3D images and dNNs will be trained to classify whether they are near-native or not. Unlike other machine learning techniques, dNNs are now able to learn from millions of data without reaching a performance plateau quickly, which is computationally tractable by harvesting GPU and Hadoop technologies.
The resulting scoring function, DeepRank, will markedly enhance our capability to reliably model biomolecular complexes, assisting the scientific community to gain insights into macromolecular aspects of life. It will be implemented in our HADDOCK modelling platform and freely distributed through GitHub and eStep repositories, ensuring a wide dissemination. The impact will be broad since 3D image-based dNNs have applications in many other domains, such as medical diagnoses (MRI), cryo-electron microscopy and computer vision.