Quality Control in Metabolomics

Oliver Fiehn Dr. Oliver Fiehn, UC Davis Genome Center

Comprehensive analysis of small molecule metabolites (30-1500 Da) is a challenging task for quality control. Metabolites are found in very different concentrations in complex biological matrices, from which they have to be extracted without compromising the structural integrity and relative abundances. There are metabolites which are transformed extremely rapidly if enzymatic activity is not stopped completely at the time of sample collection, such as the ratio of the energy metabolites ATP to ADP. Similarly, redox carriers such as NADH and NADPH are very sensitive to oxidative degradation during sample preparation. Consequently, quality control in metabolomics means more than just taking care of chromatographic or mass spectrometry parameters. Quality control is an attitude towards gaining reliable data, rather than an automatic procedure implemented in instrument software.

Metabolomics is not a numbers game of detection; it is an extension of classical target-driven analytical chemistry.

The first issue critical to obtaining valid metabolomic data is understanding the question behind a study. This means that communication with the partners of the metabolomic laboratory is an essential part of any metabolomic study. Most often, at least one other partner will be involved in a study (e.g. another laboratory focused on understanding the effect of a particular genetic alteration in an organism), and these partners may already have hypotheses on specific metabolic pathways that should be pursued. These hypotheses may then lead to suggestions for analytical procedures. For example, many secondary metabolites are easier to analyze by LC/MS methods whereas most primary metabolites can readily be quantified by GC/MS procedures. Therefore, communication with the partners should focus on the chemical classes of compounds that should be targeted. It is also critical for the analytical laboratory to understand that unbiased analysis of mass spectrometric data sets does not constitute metabolomics. A multivariate statistical differentiation of ‘test’ versus ‘control’ samples is meaningless if no identified metabolites can be reported that allow biological interpretation! Unidentified signals in metabolite analysis are as useless as unscored peptide peaks in proteomic experiments. Metabolomics is not a number game of detection of m/z features, but must be regarded as an extension of classical target-driven analytical chemistry. Only if the quantification and identification of known compounds empowers biological interpretations, can unknown peaks be further investigated and pulled into statistical tests.

There is a fundamental problem associated with metabolomics analyses, that is, the lack of clean up steps. If metabolomics means a comprehensive analysis of a wide range of small molecules, varying in molecular size, functional moieties, lipophilicity, volatility, or other physicochemical parameters, then the analytical laboratory faces tough choices. One option is to employ a variety of fractionation steps, but this can cause biases in metabolite coverage, require a number of different analytical procedures (raising the subsequent challenge of integrating the data sets), and also may result in analyte loss or degradation. Alternatively, the whole extract is subjected to one or several analytical methods; however, certain matrix components may lead to deterioration of analytical quality. In such cases, literally dirt is injected into the instrument! It is critical, therefore, to acknowledge that each matrix type requires validation and that procedures that worked for microbial organisms may be very inadequate for more complex samples such as blood plasma. For example, nonvolatile material will remain in the liner and other parts of the injector in GC/MS systems, causing problems with cross-contamination, progressing pyrolysis of material, and ultimately the formation of adsorptive materials, or catalytically active sites, in the injector system. Therefore, frequent liner changes are highly recommended.

Correspondingly, for LC/MS procedures, matrix components may be irreversibly adsorbed onto stationary phases, giving rise to similar challenges as described for GC/MS. Additionally, the soft electrospray ionization in LC/MS is a more selective or vulnerable process than the hard electron impact ionization in GC/MS. It is insufficient to declare that in LC/MS no major matrix effect is apparent with respect to ion suppression just based on quenching of signal intensity of a single infused compound. This single compound may have characteristics that make it less vulnerable to matrix effects, and thus unsuitable to explore matrix effects. Far better suited are classical approaches, most importantly the use of isotope labeled internal standards. Quality control in metabolomics means that the short-term and long-term influence of matrix effects is carefully evaluated by comparing the metabolite coverage and their relative quantification levels to expected values from background knowledge. Only if quantification of a range of well-known target metabolites validates a specific analytical protocol, can unbiased analysis be furthered to the level of metabolomics and comprise novel metabolite signals. Such integration of classical analytical strategies with modern unbiased data analysis should also include randomized sample sequences, blank controls, and bracketing samples with external calibration standards.

Novel algorithms are needed to score metabolic signals based on all available information, from calculated physicochemical characteristics to presence in biochemical databases.

Among the most difficult challenges in metabolomics is the annotation of unknown metabolic signals. The Metabolomics Standards Initiative (MSI) has issued a variety of suggestions for reporting minimal experimental parameters to ensure that metabolomic data can be used and reproduced by other laboratories. Importantly, the identification of metabolites must always be based on at least two orthogonal physicochemical characteristics, such as retention index and mass spectrum. Identifications that are based on authentic chemical standards are generally more trustworthy than annotations based on calculated characteristics. Nevertheless, the metabolome itself is an unrestricted entity that clearly comprises more than the suite of known compounds to be found in classical textbooks or that can be purchased from chemical manufacturers. The metabolome cannot be simply computed from reconstructed biochemical pathways due to enzymatic diversity, substrate ambiguity, and variation in regulatory mechanisms. Hence, the finding of many unknown signals in metabolomic surveys comes as no surprise to biochemists. The sheer complexity of natural products, including isomeric compounds, renders the use of accurate masses and database queries insufficient for annotation of metabolites. Instead, novel algorithms are needed to score metabolic signals based on all available information, from calculated physicochemical characteristics to presence in biochemical databases. Such algorithms might ultimately boost the quality of metabolomic data in a similar way as SEQUEST did for proteomic analysis. Yet, no software is available to perform this much-needed task.

Dr. Oliver Fiehn is a leading researcher in the field of metabolomics. He is a Professor in the Genome Center at the University of California, Davis. Dr. Fiehn’s research focuses on developing and applying analytical and bioinformatic methods, primarily GC/MS and LC/MS, in order to unravel the changes in metabolic networks in sets of biological situations.