Chemical Informatics for Metabolite Identification and Biochemical Network Reconstruction

Chemical informatics for metabolite identification and biochemical network reconstruction
Image: Tadpoles by Geoff Gallice (CC License)

In the 17th century Santorio Santorio conducted an expirement in which he weighed himself before and after eating, sleeping, working, fasting, and drinking. He found that most of the food he took in was lost through what he called “insensible perspiration”. What he was in fact witnessing were mechanisms of metabolic processes. Metabolism (from Greek: “change”) is the set of life-sustaining chemical transformations within the cells of living organisms; they allow organisms to grow and reproduce, maintain their structures, and respond to their environments.

Metabolomics, the technology to comprehensively measure (changes in) the metabolites in a biological sample, has great potential to impact on our understanding of biological systems and processes at a chemical level. Full exploitation of metabolomics data is currently limited by the complexity of the datasets generated within current platforms which are difficult to manage by human experts alone. eScience technology is therefore required to play a crucial role in mining and interpreting complex metabolomics data.

In this project a computational workflow will be developed to improve and accelerate metabolite identification and biochemical pathway reconstruction in metabolomics studies. A key step in the workflow is generating an in silico metabolite network on the basis of empirically derived reaction rules that delivers candidate structures for unknown metabolites in a metabolomics experiment. This will allow more systematic and automated structure elucidation on the basis of the bioanalytical data (e.g. LC-MS) and at the same time provide hypotheses for the biochemical pathways leading towards the newly identified metabolites.

Measurement of the metabolites in a biological sample results in a snapshot of the physiology of the cell. Integration of metabolomics data with other –omics data will present insight in the machinery which is present in cells and how these are used to metabolize compounds and will therefore provide a more complete picture of the functioning of organisms.

Due to the chemical diversity of metabolites, automation and throughput of the identification process is currently less advanced in metabolomics than in proteomics and transcriptomics. Development of a computational workflow to improve and accelarate metabolite identification and biochemical pathway reconstruction is required for metabolomics to increase its impact in systems biology.

Currently, the developed technology allows uploading mass spectral data and retrieval of candidate molecules from several public sources. The candidate molecules for each measured mass are presented and ranked by probability of being the measured compound by matching calculation against measured fragmentation patterns. This allows metabolomics experts to focus on the most relevant candidates and obtain a quick indication of the fragmentation pathways that occur. To extend the technology to the identification of unknown metabolites, not yet present in chemical databases, reaction rules are applied to complement the libraries of candidate molecules with potential metabolic products.

Successful implementation of this new concept will be accomplished by means of a flexible data infrastructure, efficient and parallelized computational algorithms and visualization of complex data. The result will be a practical toolbox that will be integrated with existing workflows for metabolomics data analysis.


  • Lars Ridder
    Netherlands eScience Center
  • Stefan Verhoeven
    Netherlands eScience Center