Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Metabolomics (2007) 3:211–221

DOI 10.1007/s11306-007-0082-2

ORIGINAL ARTICLE

Proposed minimum reporting standards for chemical analysis


Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI)

Lloyd W. Sumner Æ Alexander Amberg Æ Dave Barrett Æ Michael H. Beale Æ


Richard Beger Æ Clare A. Daykin Æ Teresa W.-M. Fan Æ Oliver Fiehn Æ
Royston Goodacre Æ Julian L. Griffin Æ Thomas Hankemeier Æ Nigel Hardy Æ
James Harnly Æ Richard Higashi Æ Joachim Kopka Æ Andrew N. Lane Æ
John C. Lindon Æ Philip Marriott Æ Andrew W. Nicholls Æ Michael D. Reily Æ
John J. Thaden Æ Mark R. Viant
Received: 9 January 2007 / Accepted: 27 July 2007 / Published online: 12 September 2007
 Springer Science+Business Media, LLC 2007

Abstract There is a general consensus that supports the of this community effort. This article proposes the mini-
need for standardized reporting of metadata or information mum reporting standards related to the chemical analysis
describing large-scale metabolomics and other functional aspects of metabolomics experiments including: sample
genomics data sets. Reporting of standard metadata pro- preparation, experimental analysis, quality control,
vides a biological and empirical context for the data, metabolite identification, and data pre-processing. These
facilitates experimental replication, and enables the re- minimum standards currently focus mostly upon mass
interrogation and comparison of data by others. Accord- spectrometry and nuclear magnetic resonance spectroscopy
ingly, the Metabolomics Standards Initiative is building a due to the popularity of these techniques in metabolomics.
general consensus concerning the minimum reporting However, additional input concerning other techniques is
standards for metabolomics experiments of which the welcomed and can be provided via the CAWG on-line
Chemical Analysis Working Group (CAWG) is a member discussion forum at http://msi-workgroups.sourceforge.net/

The contents of this paper do not necessarily reflect any position of R. Beger
the Government or the opinion of the Food and Drug Administration National Center for Toxicological Research, Jefferson, AR, USA
Sponsor: Metabolomics Society e-mail: richard.beger@fda.hhs.gov
http://www.metabolomicssociety.org/
Reference: http://msi-workgroups.sourceforge.net/bio-metadata/ C. A. Daykin
reporting/pbc/ Division of Molecular and Cellular Science, School of
http://msi-workgroups.sourceforge.net/chemical-analysis/ Pharmacy, University of Nottingham, Nottingham, UK
Version: Revision: 5.1 e-mail: Clare.Daykin@nottingham.ac.uk
Date: 09 January, 2007
T. W.-M.Fan  R. Higashi
L. W. Sumner (&) Department of Chemistry, University of Louisville, Louisville,
The Samuel Roberts Noble Foundation, Ardmore, OK, USA KY, USA
e-mail: lwsumner@noble.org
T. W.-M.Fan
e-mail: teresa.fan@louisville.edu
A. Amberg
Sanofi-Aventis Deutschland GmbH, Frankfurt, Germany R. Higashi
e-mail: Alexander.Amberg@sanofi-aventis.com e-mail: rick.higashi@louisville.edu

D. Barrett O. Fiehn
Centre for Analytical Bioscience, School of Pharmacy, UC Davis Genome Center, University of California, Davis, CA,
University of Nottingham, Nottingham, UK USA
e-mail: David.Barrett@nottingham.ac.uk e-mail: ofiehn@ucdavis.edu

M. H. Beale R. Goodacre
National Centre for Plant and Microbial Metabolomics, School of Chemistry and Manchester Interdisciplinary
Rothamsted Research, West Common, Harpenden, Herts, UK Biocentre, The University of Manchester, Manchester, UK
e-mail: mike.beale@bbsrc.ac.uk e-mail: Roy.Goodacre@manchester.ac.uk

123
212 L.W. Sumner et al.

or http://Msi-workgroups-feedback@lists.sourceforge.net. experiments should be performed, but to formulate a


Further, community input related to this document can also minimum set of reporting standards that describe the
be provided via this electronic forum. experimental methods (i.e. the metadata or information
describing the nature of the experiments and how they were
Keywords Metabolomics  Metabolite profiling  actually executed) to maximize the utility of the data to
Metabolite identification  Minimum reporting standards  other researchers. Consequently, there will be no attempt to
Chemical analysis  Mass spectrometry  restrict or dictate specific practices, but to develop con-
Nuclear magnetic resonance  Flux  Isotopomer analysis  sistent and appropriate descriptors to support the
GC-MS  LC-MS  CE-MS  NMR  Quality control  dissemination and re-use of metabolomic data. Such
Method validation reporting standards will specify the metadata identified as
necessary for complete and comprehensive reporting in a
range of contexts, such as submission to academic journals
1 Introduction and public databases. Data exchange standards will be
developed to provide a transparent technical vehicle which
The aim of the Chemical Analysis Working Group meets or exceeds the requirements of reporting standards.
(CAWG) as part of the Metabolomics Standards Initiative The scope of the CAWG includes sample preparation,
(MSI) is to identify, develop and disseminate a consensus experimental analysis, instrumental performance, method
description for the best chemical analysis practices related validation, metabolite identification, and data pre-
to all aspects of metabolomics. Ideally, the proposed processing. There is slight overlap in the sample prepa-
standards will consist of good analytical chemistry prac- ration with the Biological Context Working Group and
tices while providing specific provisions for metabolomic slight overlap in data pre-processing with the Data Pro-
data (the main distinction being large numbers of data-sets cessing Working Group. However, the scope and focus of
each containing large numbers of measurements, and the the CAWG is upon the experimental aspects of sample
need to compare them electronically and across different processing, instrumental analysis, and commonly used
instrumental platforms). These practices will be aligned data pre-processing methods which convert raw instru-
with those typically mandated by top quality analytical mental files into organized, tabulated file formats. The
journals. The goal is not to prescribe how metabolomics organized data are then used for further statistical and

J. L. Griffin
The Department of Biochemistry, University of Cambridge, J. C. Lindon
Cambridge, UK Department of Biomolecular Medicine, Imperial College
e-mail: jlg40@mole.bio.cam.ac.uk London, London, UK
e-mail: j.lindon@imperial.ac.uk
T. Hankemeier
Division Analytical Biosciences, Leiden University, Leiden, P. Marriott
The Netherlands School of Applied Sciences, RMIT University, Melbourne,
e-mail: hankemeier@chem.leidenuniv.nl Australia
e-mail: philip.marriott@rmit.edu.au
N. Hardy
Department of Computer Science, University of Wales A. W. Nicholls
Aberystwyth, Aberystwyth, UK Investigative Preclinical Toxicology, GlaxoSmithKline, Ware,
e-mail: nwh@aber.ac.uk UK
e-mail: andrew.w.nicholls@gsk.com
J. Harnly
Food Composition and Methods Laboratory, Beltsville Human M. D. Reily
Nutrition Research Center, Agricultural Research Service, Discovery Biomarkers, Pfizer Global R&D, Ann Arbor, MI,
U.S. Department of Agriculture, Beltsville, MD, USA USA
e-mail: james.harnly@ars.usda.gov e-mail: Michael.Reily@pfizer.com

J. Kopka J. J. Thaden
Max Planck Institute of Molecular Plant Physiology, Golm, College of Medicine, University of Arkansas for Medical
Germany Sciences, Little Rock, AR, USA
e-mail: Kopka@mpimp-golm.mpg.de
M. R. Viant
A. N. Lane School of Biosciences, The University of Birmingham,
James Graham Brown Cancer Center, University of Louisville, Birmingham, UK
Louisville, KY, USA e-mail: m.viant@bham.ac.uk
e-mail: anlane01@gwise.louisville.edu

123
Minimum reporting standards 213

chemometric analysis which are the focus of the Data biofluids. However, it is fundamentally essential that suffi-
Processing Working Group. cient information is provided about sample preparation to
The operational plan of the CAWG is to cooperatively enable experimental reproduction as well as to provide
draft a consensus document that describes a minimum core convincing evidence of sample integrity. The initial stages
set of necessary metadata related to the chemical analyses of sample preparation are often generic, whereas the final
associated with metabolomics experiments. This will be stages are almost always technique-specific. Therefore,
based upon community input from generalists and spe- proposed minimum standards for generic sample prepara-
cialists relating to the most common technologies utilized tion are provided here, whereas instrument specific sample
in metabolomics. The CAWG will evaluate previous and preparation details are provided within the respective
relevant work in other specialist areas including similar instrumental sections. Further, the issue of sample collection
work in transcriptomics and proteomics studies, and recent and processing is being addressed by multiple MSI working
metabolomics standardization efforts. The group will pay groups and thus, there is some overlap on this theme (Fiehn
careful attention to the distinction of best practice (which et al. 2007; Griffin et al. 2007; van der Werf et al. 2007).
will evolve as the science and technology of metabolomics However, greater emphasis is provided here concerning the
advances), reporting standards (which should have longer experimental aspects of the sample processing.
validity) and data exchange standards (which support
reporting). It will work with relevant journals and editorial
• Sampling process and protocol
• Replicate sampling and analyses: Substantial bio-
staff to review and advise on the practicality, acceptability,
logical variance exists within all organisms;
and support of standards.
therefore replicate sampling and analyses are crit-
The proposed CAWG standards were originally descri-
ical to provide a statistical basis for data evaluation
bed during the NIH Metabolomics Workshop convened
and interpretation. A minimum of triplicate (n = 3)
in August, 2005 (http://www.niddk.nih.gov/fund/other/
biological sampling is proposed with n = 5 pre-
metabolomics2005/) and are based upon significant litera-
ferred. Biological replicates (repetitive analyses of
ture (Bino 2004; Jenkins et al. 2004; Quackenbush 2004;
samples obtained from different individuals or
Jenkins et al. 2005; Lindon et al. 2005; Fiehn et al. 2006,
pooled individuals from a population) are preferred
Rubtsov et al. 2007). Significant input has been provided
over analytical replicates (repetitive analyses of the
related to mass spectrometry (MS) and nuclear magnetic
same sample obtained from the same individual or
resonance (NMR) based metabolomics, but the ultimate
pooled individuals) as biological variance almost
schema is aimed at all analytical approaches used in met-
always exceeds analytical variance.
abolomics. Input to date has been provided by a diversity
• Tissue harvesting method: For example, sample
of academic and commercial entities through personal
freezing method (e.g. liquid N2, dry ice and acetone
communications and through the on-line discussion forum
bath, freeze clamping, etc.), sample wash method
(http://msi-workgroups.sourceforge.net/).
for removing unwanted external components, time
and duration for tissue collection (e.g. time from
2 Proposed minimum information for reporting tissue resection to liquid N2 freezing), temperature,
chemical analysis and sample storage prior to further preparation (e.g.
–80C for 2 weeks). All temperatures should be
The following sections describe the proposed minimum measured if possible; however temperature set-
information for reporting chemical analyses metadata that points are acceptable assuming quality monitoring
have been discussed to date. The proposed minimum was performed and no abnormalities recorded.
reporting standard information is presented below as bul- • Biofluid harvesting or collection method: For
leted text which is augmented with numerous examples. The example, syringe, collection onto refrigerated sur-
examples should not be viewed as required and are not meant face, vacuum system/vacutainers used for blood
to include an exhaustive list of all possibilities. However, the collection, storage vessel and anticoagulant (if
examples should help the reader better visualize the relevant), temperature, velocity and duration of
requested context of the proposed minimum information. centrifugation, and sample freezing method.
• Tissue processing method: For example, lyophil-
2.1 Proposed minimum metadata for sample ization, fresh tissue processing, pulverization/
preparation homogenization, tissue cell lysis (e.g. liquid N2
grinding, manual or electric homogenization, bead-
Sample preparation is a vast topic which can vary dramat- based homogenization, ultrasonic cell lysis, buffer
ically for different species, tissues, cell cultures, and based lysis, etc.).

123
214 L.W. Sumner et al.

• Storage conditions prior to extraction or further • Chromatography instrument description


processing (e.g. –80 C, duration, atmospheric • Manufacturer, model number, software package
pressure or vacuum, desiccation, preservatives and version number or date.
added).
• Relocation and shipping of tissues from one • Auto-injector
laboratory to another (if relevant). • Injector model/type, software version, injection
volume, wash cycles (volumes), solvent.
Generic extraction and subsequent sample handling that
are typically employed for most samples (instrument spe- • Separation column and pre/guard column
cific sample processing methods are provided in the • Manufacturer, model number/name, stationary
respective sections, below). media composition (support and coating, e.g. silica,
C18, etc.) and physical parameters (i.e. coating
• Extraction method thickness for GC/MS, particle size and pore size for
• Solvent(s), pH and ionic strength of buffer, solvent LC/MS), internal diameter, and length.
temperature and volume(s) per quantity of tissue,
number of replicate extracts, sequential extraction, • Technique-specific sample preparation
and extraction time. • Resuspension of sample (e.g. in mobile phase),
• Example: 1 ml ice-cold methanol (MeOH) per amount injected.
6 mg lyophilized tissue, two extractions combined, • Derivatization reaction conditions if relevant, (e.g.
CHCl3/MeOH (2/1, v/v) followed by 10% trichlo- OMS/trimethylsilyl; chemical manufacturer, tem-
roacetic acid extraction. peratures, and duration).
• It is noted that degassing of solvents is important to • Sample spiking e.g. internal standards, retention-
minimize redox reactions of sensitive compounds index standards.
such as ascorbate, cysteine, etc. • Separation parameters
• Extract concentration, dilution, and resolubilization • Method name (a detailed method can be published
processes elsewhere and referenced here by a unique protocol
• Dried under nitrogen, resolubilized in H2O or identifier), injector temperature, split or splitless
pyridine. mode and ratio, LC post-column split, mobile phase
compositions, mobile phase flow rates, pressure,
• Extract Enrichment (if relevant) thermal/solvent/solute gradient profiles.
• SPE (solid phase extraction column volume/mass,
elutant, sorbent, manufacturer)
• Desalting, molecular weight cut-off, ion exchange, 2.3 Proposed minimum metadata relative to mass
etc. spectrometry
• Extract Cleanup and/or Additional Manipulation
Mass spectrometry is a popular but complex technique
• Ultrafiltration, removal of paramagnetic ions, addi-
used in metabolomics. Thus, it is necessary sufficient
tion of metal chelators such as EDTA, citrate
details to enable experimental replication and the following
• Extract Storage and/or Relocation minimum reporting standards are proposed for mass
• Storage conditions prior to and during analysis spectrometry.
• Relocation and shipping of extracts from one
laboratory to another (if relevant)
• Instrument description
• Manufacturer, model number, software package
and version (name, number or date).
2.2 Proposed minimum metadata relative to
• Sample introduction and delivery
chromatography
• From GC, from LC, direct infusion without chro-
matography, direct infusion using dedicated
The majority of mass spectrometry based metabolomics
autosampler flow rate.
methods include sample introduction via hyphenated
chromatography. This is also a feature of some NMR • Ionization source
experiments (i.e. LC/NMR) as well as other analytical • Ionization mode (EI, APCI, ESI etc.), polarity
devices, e.g. photodiode arrays, Coulombic arrays, etc. (positive or negative-ion analysis), vacuum pres-
Thus, it is critical to define the chromatographic parameters sure, skimmer/focusing lens voltages (e.g. capillary
and the following metadata are suggested. voltage etc.), gas flows (e.g. nebulization gas, cone

123
Minimum reporting standards 215

gas etc., source temperature). Although these values CDCl3 etc.), buffer, chemical shift or calibration
will vary between instruments, they should provide standard.
a cumulative view of the ionization conditions
sufficient to enable reproduction of the experiment.
• Data acquisition parameters
• For 1-D 1H or X-nucleus NMR: temperature,
• Mass analyzer description and acquisition mode observed nucleus, pulse sequence name, pulse
• Type (quadrupole, ion-trap, time-of-flight, FT-ICR, sequence implementation (e.g. gradient selection,
including combinations of these for hybrid instru- sensitivity enhancement), spin rate or statement of
ments), acquisition mode (full scan, MSn, SIM, no spin, solvent saturation or decoupling method,
MRM, etc.). presence or absence of heteronuclear decoupling
(e.g. isotope-enriched samples), decoupling mode
• Technique-specific sample preparation (if relevant)
and bandwidth; spin lock field strength (in Hz) and
• Re-suspension of sample (e.g. in MeOH:water 1:1
duration (in sec), mixing time (for NOESY, ROESY
with 0.2% formic acid), derivatization, volume
etc.), spin echo time (e.g. for relaxation analysis or
injected, and internal calibrant(s) added (if
broadline suppression), RF pulse widths, any selec-
relevant).
tive pulse shapes and durations used, magnetic field
• Data acquisition parameters gradient pulse times and shapes, spectral width,
• Date, operator, data acquisition rate, m/z scan acquisition time, relaxation delay and additional
range, compounds used for m/z calibration, mass delays (mixing time, etc.), interpulse delay (or
resolution, mass accuracy, logic program used for recycle time), digitization parameters, spectral
data acquisition (often reported for ion-traps), width and acquisition time, number of transients,
spectral acquisition rate, vacuum pressure, and/or and number of steady states transients (i.e. dummy
lock spray (concentration, lock mass, flow rate, and scans). For solvent suppression: technique, excita-
frequency). tion maximum and bandwidth.
• Additional parameters for 2-D and higher dimen-
sional NMR: observed nucleus in F2 and F1, pulse
sequence, excitation pulse widths for relevant
2.4 Proposed minimum metadata relative to nuclear nuclei, spectral width in F2 and F1, solvent
magnetic resonance saturation method, number of transients in t2 and
number of increments in t1, acquisition times for t2
NMR is a popular, but complex technique used in meta- and t1, phase sensitive or magnitude detection.
bolomics. Thus, it is necessary sufficient details to enable pulsed field gradient strengths and shapes (z or
experimental replication and the following minimum x,y,z) and maximum gradient strength (if relevant
reporting standards are proposed for mass spectrometry. to the pulse sequence).
• Additional parameters for X-nucleus 1D and higher
• Instrument description
dimensional NMR: direct or indirect detection,
• Manufacturer, model name/number, magnetic field
proton decoupling mode (Waltz, Garp, Wurst, Stud
strength in Tesla (example 14.1 T Varian Inova;
etc.) and effective band width, evolution time for
18.8 T Bruker Avance) or proton resonance
constant time experiments, editing mode (cf.
frequency e.g. 600 MHz, and console description.
INEPT-based experiments), heteronuclear spin lock
• Instrument configuration strength and mixing time (e.g. HCCH-TOCSY).
• VT control, pulsed field gradients (z or x,y,z) and • Additional parameters for pseudo 2D NMR exper-
maximum gradient strength (if used), number of iments: physical parameter varied in the t1 dimension
shims, number of channels. (e.g. T2, T1, diffusion period, chromatographic
• Probe type (e.g. 10 mm 31P, 5 mm HCN cold probe, separation time as in LC-NMR, etc.), pulse sequence,
3 mm flow-probe etc.), solution or solid-state, array of values used for physical constants.
automation or manual operation, autotune or man-
ual tune, and probe gas. For LC-NMR: sample
handler, injection volumes, wash cycles and solvent.
2.5 Proposed minimum metadata relative to stable
• Instrument-specific sample preparation isotopes & flux analysis
• Volume, extract/powder/intact organisms, tissue or
cells, type of NMR tube (e.g. conventional, Shi- Many researchers utilize stable isotopes and flux analysis
gemi, microcell etc.), pH, solvent (D2O, CD3OD, in metabolomics research to better understand mass flow

123
216 L.W. Sumner et al.

through pathways. Therefore, the following minimum • Data acquisition parameters


reporting standards are proposed for stable isotopes and • Wavenumber (cm–1) range.
flux analysis. • Rate of acquisition.
• Isotope labeled precursors used • Spectral resolution (in cm–1).
• Element/isotope, position(s), percent labeled; e.g. • Number of spectra co-added.
[13C-1]-D-glucose (98%), [15N2]-L-glutamine • Number of data points in the resultant spectrum,
(99%). and how this is displayed (absorbance or
• Isotope source (i.e. manufacturer), chemical purity transmission).
of the labeled compound(s), concentration of the
compound, fraction of total present (requires
detailed breakdown of media composition for cell 2.7 Proposed minimum metadata relative to
and tissue studies, including analysis of any added instrumental performance and method validation
FCS or other growth supplements; labeling
scheme). Instrumental performance validation/qualification and
• Total number of moles isotope added during the method validation help ensure reliable data production and
experiment. to demonstrate that a particular method used for quantita-
tive measurement of an analyte(s) in a given biological
• Duration of pulse label or continuous addition
matrix, such as plants, blood, plasma, serum, or urine, is
reliable and reproducible for the intended use (Thompson
et al. 2002; FDA 2001). These quality control procedures
are fundamental components of Good Laboratory Practices
2.6 Proposed minimum metadata relative to Fourier
(GLP), Good Analytical Practices (GAP), and Good
transform infrared (FT-IR) spectroscopy
Manufacturing Practices (GMP). Although instrumental
performance and method validation are not mandated, they
FT-IR spectroscopy has been used for metabolic finger-
are recommended and the following descriptions are
printing and footprinting (Ellis and Goodacre 2006). In this
suggested.
approach the classification of samples is based on prove-
nance of either their biological relevance or origin and does • Minimum Reporting of Instrumental Performance
not usually give specific metabolite information. The fol- Parameters is Encouraged. The nature and method(s)
lowing minimum reporting standards are proposed for FT- used to ensure sensitive and selective instrumental
IR spectroscopy. performance should be reported and the following
details and descriptors are deemed appropriate.
• FT-IR spectrometer instrument description
• Mass spectrometry instrument performance valida-
• Manufacturer, model number, software name and
tion parameters reported might include chemical
version number or date.
description of the m/z calibration standard used,
• Instrument configuration accuracy of m/z calibration, mass resolution, and
• Type of sampling compartment used, including ion source optimization parameters. For hyphen-
where necessary type of microscope employed. ated MS methods, suggested reporting parameters
• Type of detector used (DTGS (deuterated triglycine could include chromatographic resolution, accuracy
sulphate), MCT (mercury cadmium telluride), and/ and precision of internal standard(s) or retention
or FPA (focal plane array). time markers, accuracy and precision for replicated
analyses, accuracy and precision for validation
• Technique-specific sample preparation
sample(s), and cycles per column/injector/septum/
• Resuspension of n mg ml–1 sample into solvent,
blank.
volume analysed.
• NMR instrument performance verification parame-
• Sample presentation ters might include calibration standard used (name,
• Transmission measurement: in KBr, or on ZnSe, Si chemical shift and concentration; e.g. 0.5 mM DSS
windows. or 1 mM TMS at 0.0 ppm), statement of line width
• Reflectance measurement: on Si, Au, Al, or other of the standard at 50% and 1% of its full height (e.g.
defined metal sample carrier. DSS, TSP or TMS methyl peak) or residual water,
• Diffuse reflectance measurement: on defined metal pH marker used (if relevant) and shift correction.
sample carrier. For X nuclei: external or internal reference and
• Sampling area, and for imaging pixel size. conditions, and correction made for susceptibility

123
Minimum reporting standards 217

effects. Reporting of shift referencing method for • A quantifier of the method accuracy (i.e.
indirect dimension in 2D experiments (direct or standard deviation, relative standard deviation,
indirect based on c ratios) would also be beneficial. coefficient of variance) should be reported and
bias assessed if possible (bias; due to method,
• Quantitative Method Validation. Two methods of quan- lab, ion suppression, etc.).
titative analysis are typically used in metabolomics and • A quantifier of the method precision (i.e.
include relative and absolute quantification. Relative standard deviation, relative standard deviation,
quantification (i.e. reporting of metabolite(s) instrument coefficient of variance) should be reported.
response relative to an internal standard or another • The lower limit of quantification (LLOQ) and
metabolite(s) level such as the sum of all metabolite confidence level should be reported. The LLOQ
abundance) is typically used in non biased metabolo- is defined as the minimum concentration gener-
mics. Whereas, absolute quantification (determination ating an instrumental signal-to-noise response
of the absolute concentration of a metabolite(s) through ratio of 10. The LLOQ has alternatively been
correlation of its instrument response to that of a known defined as 5 times the limit of detection (LOD).
concentration series of the same metabolite) is com- The LOD is defined as the concentration that
monly used in targeted metabolite(s) analysis. yields a minimum instrumental signal-to-noise
• Relative Quantification reporting should include ratio of 3.
• a description and quantifier of the added
• Additional quantitative descriptions of recovery
exongenous isotopically labeled or unlabeled and/or stability provide additional method
metabolite(s). validation.
• A description of the method used for assessing
instrument response (e.g. peak integration, bin-
ning/bucketing or deconvolution method,
intensity normalized to reference, 2.8 Proposed minimum metadata relative to data pre-
• For NMR, descriptions for correction for satu- processing
ration effects - T1 values measured), and provide
relaxation agents if added (type, amount). For The scope of the CAWG data pre-processing standards
direct X-detection (especially 13C or 31P), focuses upon the conversion of raw instrumental files into
correction for nuclear Overhauser enhancement organized/tabulated file formats. The organized data are
as well as saturation. For non-deuterated aque- then used for further statistical and chemometric analyses
ous samples, state any corrections made for non- which are the focus of the Data Processing Working Group
linear excitation profile and method. (Goodacre et al. 2007). The following minimum reporting
• Reporting on replicate analyses, standard error/ standards are proposed for data pre-processing.
deviation of quantification. • Post Acquisition Data Pre-processing
• Absolute Quantification method validation is of • Data file format used and/or conversion methods
higher rigor and performed to demonstrate that a should be reported. Examples include conversion of
particular method used for quantitative measure- proprietary file formats to more universal formats
ment of an analyte(s) in a given biological matrix, such as net.cdf, XML, MZmine, etc.
such as plants, blood, plasma, serum, or urine, is • Details of any data pre-processing methods which
reliable and reproducible for the intended use convert raw instrumental data into organized or
(Thompson et al. 2002; FDA 2001). Suggested tabular file formats should be reported.
minimum reporting standards include: • Examples for MS might include: background

• Calibration curves should be generated for each subtraction, noise reduction, curve resolution for
metabolite to be quantified in the same biolog- temporal chromatographic alignment, peak
ical matrix and include a sufficient number of picking, peak thresholding, spectral deconvolu-
standard solutions to adequately define the tion, and/or metabolite identifications. Some
instrument response to concentration relation- comparative methods do not resolve or identify
ship (i.e. suggested minimum of at least one individual metabolites prior to comparative
standard solution per order of change in con- analysis. The general experimental details
centration). The range of standard solutions used describing these methods should still be
and the range of linearity with correlation reported and should be sufficient so that others
coefficient should be reported. can replicate the data processing.

123
218 L.W. Sumner et al.

• Examples for NMR data pre-processing might 1. Identified compounds (see below).
include phase-correction method (e.g. auto- 2. Putatively annotated compounds (e.g. without chem-
matic, manual), conversion from time to ical reference standards, based upon physicochemical
frequency domain (e.g. Fourier Transform), properties and/or spectral similarity with public/com-
degree of zero filling, degree of linear predic- mercial spectral libraries).
tion; apodization parameters and window 3. Putatively characterized compound classes (e.g. based
functions in all dimensions (exponential, Gauss- upon characteristic physicochemical properties of a
ian, sine bell etc.), baseline corrections (dc chemical class of compounds, or by spectral similarity
offset, linear or non-linear corrections), first to known compounds of a chemical class).
point multipliers, any shifting of the free 4. Unknown compounds—although unidentified or
induction decays. unclassified these metabolites can still be differentiated
• For data analysis of isotope labeling of flux and quantified based upon spectral data.
experiments, the method for determining posi-
tional and fractional labeling, standard error of Authors should clearly differentiate and report the level of
the estimates; and estimated isotope recovery in identification rigor for all metabolites reported.
observable fractions (and fraction of total The majority of metabolite identifications reported are
isotope supplied) should be described. typically non-novel as they have been previously charac-
• Examples for FT-IR spectroscopy might include terized, identified, and reported at a rigorous level in the
conversion from time to frequency domain (e.g. literature. Thus, non-novel metabolites not being identified
Fourier Transform), and degree of zero filling. for the first time are often identified based upon the co-
Baseline corrections parameters might include characterization with authentic samples. However, it is
offsets, level and type of derivatisation (includ- generally believed that a single chemical shift, m/z value,
ing algorithm, window size for smoothing), and or other singular chemical parameter is insufficient for non-
whether or not CO2 was removed from spectra novel metabolite identification. Thus, the following mini-
(deleted or a linear trend fitted). mum standards for level 1, non-novel metabolite
identification are proposed.
• A minimum of two independent and orthogonal data
2.9 Proposed minimum metadata relative to metabolite relative to an authentic compound analyzed under
identification identical experimental conditions are proposed as
necessary to validate non-novel metabolite identifica-
Metabolite identification is a fundamental function that tions (e.g. retention time/index and mass spectrum,
converts raw data into biological context. Thus, metabolite retention time and NMR spectrum, accurate mass and
identifications are critical to the large-scale analysis of tandem MS, accurate mass and isotope pattern, full 1H
metabolites, i.e. metabolomics, and metabolite identifica- and/or 13C NMR, 2-D NMR spectra). The use of
tions should be of significant rigor to validate the literature values reported for authentic samples by other
identification. While it is difficult to prescribe a minimum laboratories are generally believed insufficient to
reporting requirement for identification, the rigor of the validate a confident and rigorous identification. The
metabolite identifications should be aligned with accept- use of literature or external laboratory data result in
able practices for chemical journals (see level 2 identifications.
• If spectral (MS or NMR) matching is utilized in the
http://pubs.acs.org/journals/jacst/ identification process then the authentic spectra used
http://www.rsc.org/Publishing/ReSourCe/ for the spectral matching should be described appro-
AuthorGuidelines/ArticleLayout/sect3.asp priately or libraries made publicly available. It is
https://paragon.acs.org/paragon/ preferred that the reference spectra are made available
ShowDocServlet?contentId=paragon/menu_content/ at no cost, but the CAWG recognizes that this may not
authorchecklist/CCCmk1.xls. always be possible for commercialized libraries (NIST,
However, the exact basis for what constitutes a valid Wiley, etc.). However, the premise of this minimum is
metabolite identification is still currently debated in the that authors document and provide the spectral evi-
community and a consensus is still evolving. dence to validate the metabolite identifications. If the
Currently, four levels of metabolite identifications can authors choose not to provide the experimental evi-
be found in the published metabolomics literature. They dence to support the identifications, then the
include: identifications should be reported as ‘putative
identifications’.

123
Minimum reporting standards 219

• Metabolite identifications based upon additional http://pubs.acs.org/journals/jacst/, and http://www.rsc.org/


orthogonal data (i.e. more than two) are highly Publishing/ReSourCe/AuthorGuidelines/ArticleLayout/
advantageous, provide additional confidence, and are sect3.asp, https://paragon.acs.org/paragon/ShowDocServ-
often necessary to provide unambiguous identification let?contentId=paragon/menu_content/authorchecklist/
of stereo configuration. Additional data consistent with CCCmk1.xls). This traditionally involves extraction, iso-
best chemical practices might include: selective solvent lation, and purification followed by elemental analysis,
extraction, retention time, m/z, photodiode array spec- accurate mass measurement, ion mass fragmentation pat-
tra, kmax and emax,chemical derivatization, isotope terns, NMR (1H, 13C, 2D), and other spectral data such as
labeling, 2D NMR, IR spectra, etc. IR, UV, or chemical derivatization. The CAWG fully
supports these traditional criteria for novel metabolite
identifications.
2.9.1 Nomenclature for non-novel metabolites
2.9.3 Nomenclature for novel metabolites
The standard for compound nomenclature is provided by
the International Union of Pure and Applied Chemistry
For novel metabolites identified for the first time and/or
(IUPAC, http://www.chem.qmul.ac.uk/iupac/). However,
compounds that are not yet included in PubChem (http://
these rules typically result in very complex and lengthy
pubchem.ncbi.nlm.nih.gov/), formal naming should be
names. As a result, IUPAC names are traditionally replaced
consistent with IUPAC nomenclature and common naming
with shorter more common names, e.g. rutin as compared
is left to the author’s discretion. However, author(s) are
to 2-(3,4-dihydroxyphenyl)-5,7-dihydroxy-3-[(2S,3R,4S,
encouraged to (a) submit novel structures to PubChem and/
5S,6R)-3,4,5-trihydroxy-6-[[(2R,3R,4R,5R,6S)-3,4,5-trihy-
or (b) release an electronic code for the structure, i.e. the
droxy-6-methyl-oxan-2-yl]oxymethyl]oxan-2-yl]oxy-chro-
INCHI code that is recommended by IUPAC and NIST.
men-4-one. Compounds can also be referenced by
The INCHI code and software to generate this code for
numerical identifiers such as:
chemical drawings is freely available (http://inchi.info/
Chemical Abstract Service (CAS; http://www.cas.org/) software_en.html).
Chemical Entities of Biological Interest (ChEBI;
http://www.ebi.ac.uk/chebi/)
2.10 Proposed minimum metadata relative to reporting
Molfile
of unknown metabolites
PubChem compound identifier
(CID; http://pubchem.ncbi.nlm.nih.gov/
Within most metabolomics datasets, there are typically
Simplified Molecular Input Line Entry Specification
many unknown analytes, i.e. level 3 and 4 compounds.
(SMILES; Anderson et al. 1987;
Obviously, those deemed highly important to the study
Weininger 1988; http://www.daylight.com/smiles/)
should be rigorously identified according to the metabolite
IUPAC International Chemical Identifier
identification discussions above. This is not possible in all
(InChI; http://inchi.info/)
cases due to time restrictions or the lack of authentic
Generally, CAS numbers are less favored due to the material for unambiguous assignment. However, these
proprietary nature of these numbers, whereas CID, unknown metabolites can often still be differentiated based
SMILES, and INCHI codes are more preferred. It is the upon unique experimental data, i.e. spectral or chromato-
CAWG current opinion that INCHI codes offer a favorable graphic features, and it is valuable to systematically report
format for data exchange and database communication. such ‘‘unique unknowns’’ in a meaningful manner to other
Thus, it is suggested that authors report a minimum of one researchers. The following minimum reporting standards
chemical name (IUPAC or common) and one structural are suggested for systematically naming unidentified
code for all identified metabolites for publication. metabolites.

2.10.1 Nomenclature for unknown metabolites


2.9.2 Novel metabolite identifications
• For NMR, the exact chemical shift and multiplicity of
Metabolites identified for the first time and which represent at least one nucleus in the metabolite should be part of
novel identifications should include sufficient evidence for the unknown nomenclature For example, an unidenti-
full stereochemical structural identification and acceptable fied triplet at 1.16 ppm could be reported as: ‘unknown
criteria are clearly defined by most journals (i.e. (1.16 ppm, triplet)’. When such a signal can be

123
220 L.W. Sumner et al.

correlated with other atoms in the same molecule using standards, and an internet discussion site has been estab-
multidimensional or multi-pulse techniques, the chem- lished at http://msi-workgroups.sourceforge.net/ or
ical shifts and the connectivity of such correlated nuclei http://Msi-workgroups-feedback@lists.sourceforge.net to
in the unknown should be reported in the work. In such facilitate such feedback. Only through active community
cases the molecular fragement may be identified, such involvement will a functional solution be achieved.
as ‘isopropyl group’.
• For MS, the retention time, retention index, and/or
prominent ions in the mass spectrum should be reported References
along with MS-MS data if available (also see Bino
et al. 2004). Anderson, E., Veith, G. D., & Weininger, D. (1987). SMILES: A line
notation and computerized interpreter for chemical structures.
• Xenobiotics (e.g. administered drugs, related drug
Report No. EPA/600/M-87/021. U.S. EPA, Environmental
metabolites) or other exogenous compounds such as Research Laboratory-Duluth, Duluth, MN 55804.
herbicides, pesticides, etc. should be rigorously distin- Bino, R. J., Hall, R. D., Fiehn, O., Kopka, J., Saito, K., Draper, J.,
guished from endogenous metabolites for all unknowns Nikolau, B. J., Mendes, P., Roessner-Tunali, U., Beale, M. H.,
Trethewey, R. N., Lange, B. M., Wurtele, E. S., & Sumner, L.
and if possible.
W. (2004). Potential of metabolomics as a functional genomics
tool. Trends in Plant Science, 9, 418–425.
Ellis, D. I., & Goodacre, R. (2006). Metabolic fingerprinting in
3 Discussions and conclusions disease diagnosis: biomedical applications of infrared and
Raman spectroscopy. Analyst, 131, 875–885.
FDA. (2001). Guidance for industry. Bioanalytical method validation,
The Chemical Analysis Working Group will continue to http://www.fda.gov/cder/guidance/4252fnl.pdf.
work cooperatively on a consensus document that describes Fiehn, O., Kristal, B., van Ommen, B., Sumner, L. W., Assunta-
a minimum core set of necessary data related to the Sansone, S., Taylor, C., Hardy, N., & Kaddurah-Daouk, R.
(2006). Establishing reporting standards for metabolomic and
chemical analyses associated with metabolomics experi- metabonomic studies: A call for Participation. Omics, 10,
ments. Further, the CAWG will work cooperatively with 158–163.
other MSI groups to build an integrated consensus docu- Fiehn, O., Sumner, L. W., Ward, J., Dickerson, J., Lange, M. B.,
ment. The primary motivation is to establish acceptable Lane, G., Roessner, U., Last, R., Rhee, S. Y., & Nikolau, B.
(2007). Minimum reporting standards for plant biology context
practices that will maximize the utility, validity, and in metabolomics studies. Metabolomics, 3, this issue.
understanding of metabolomics data. It is envisioned that Goodacre, R., Broadhurst, D., Smilde, A. K., Kristal, B. S., Baker, J.
the proposed MSI minimum reporting standards will D., Beger, R., Bessant, C., Connor, S., Capuani, G., Craign, A.,
eventually lead to the generation of a schematic represen- Ebbels, T., Kell, D. B., Manetti, C., Newton, J., Paternostro, G.,
Somorjai, R., Sjöström, M., Trygg, J., & Wlfert, F. (2007).
tation and model of the reporting standards to assist Proposed minimum reporting standards for data analysis in
potential users and developers to better understand, eval- metabolomics. Metabolomics, 3, this issue.
uate, and utilize the proposed metadata. However, it is the Griffin, J. L., Nicholls, A. W., Daykin, C., Heald, S., Keun, H.,
general consensus of the MSI working groups that it is still Schuppe-Koistinen, I., Griffiths, J. R., Cheng, L., Rocca-Serra,
P., Rubtsov, D. V., & Robertson, D. (2007). Standard reporting
a little early for this effort and additional input is needed requirements for biological samples in metabolomics experi-
prior to this next step. During the interim, the MSI ments: mammalian/in vivo experiments. Metabolomics, 3, this
Exchange format working group has initiated efforts to issue.
define data exchange formats and to produce a schema for Jenkins, H., Hardy, N., Beckmann, M., Draper, J., Smith, A., Taylor,
J., Fiehn, O., Goodacre, R., Bino, R., Hall, R., Kopka, J., Lane,
such operations that cover all aspects of the metadata, the G., Lange, B., Liu, J., Mendes, P., Nikolau, B., Oliver, S., Paton,
analytical data (both spectroscopic and chromatographic) N., Rhee, S., Roessner-Tunali, U., Saito, K., Smedsgaard, J.,
and the data analysis. Sumner, L., Wang, T., Walsh, S., Wurtele, E., & Kell, D. (2004).
The above proposed standards do not cover all aspects A proposed framework for the description of plant metabolomics
experiments and their results. Nature Biotechnology, 22, 1601–
of chemical analysis. Significant input is still needed within 1606.
the specific areas of capillary electrophoresis, electro- Jenkins, H., Johnson, H., Kular, B., Wang, T., & Hardy, N. (2005).
chemical detection, and numerous other techniques. There Toward supportive data collection tools for plant metabolomics.
are also specialist areas of the mass spectrometry and NMR Plant Physiology, 138, 67–77.
Lindon, J., Nicholson, J., Holmes, E., Keun, H., Craig, A., Pearce, J.,
spectroscopy sections which may need revision or expan- Bruce, S., Hardy, N., Sansone, S., Antti, H., Jonsson, P., Daykin,
sion to cover future consideration (e.g. in vivo NMR C., Navarange, M., Beger, R., Verheij, E., Amberg, A.,
spectroscopy). However, we believe that the above texts Baunsgaard, D., Cantor, G., Lehman-McKeeman, L., Earll, M.,
provide general guidelines for improving the quality and Wold, S., Johansson, E, Haselden, J., Kramer, K., Thomas, C.,
Lindberg, J., Schuppe-Koistinen, I., Wilson, I., Reily, M.,
utility of published metabolomics datasets. To achieve this Robertson, D., Senn, H., Krotzky, A., Kochhar S, Powell J,
objective, the CAWG invites feedback and input from the van der Ouderaa, F., Plumb, R., Schaefer, H., & Spraul, M.
greater scientific community on the technologies and (2005). Summary recommendations for standardization and

123
Minimum reporting standards 221

reporting of metabolic analyses. Nature Biotechnology, 23, 833– van der Werf, M. J., Takors. R., Smedsgaard, J., Nielsen, J., Ferenci,
838. T., Portais, J. C., Wittmann, C., Hooks, M., Tomassini, A.,
Quackenbush, J. (2004). Data standards for ‘omic’ science. Nature Oldiges, M., Fostel, J., & Sauer, U. (2007). Standard reporting
Biotechnology, 22, 613–614. requirements for biological samples in metabolomics experi-
Rubtsov, D. V., Jenkins, H., Ludwig, C., Easton, J., Viant, M. P., ments: Microbial and in vitro biology experiments,
Guenther, U., Griffin, J. L., & Hardy, N. (2007). Requirements Metabolomics, 3, this issue.
for the description of NMR-based metabolomics experiments. Weininger, D. (1988). SMILES, a chemical language and information
Metabolomics, 3, this issue. system. 1. Introduction to methodology and encoding rules.
Thompson et al. (2002). Harmonized guidelines for single laboratory Journal of Chemical Information and Computer Sciences, 28,
validation of methods of analyis (IUPAC Technical Report). 31–36.
Pure and Applied Chemistry, 74, 835–855.

123

You might also like