Professional Documents
Culture Documents
Libraries Talk NC
Libraries Talk NC
C12H20O2
Mass Spectral Libraries
An Ever-Expanding Resource for
Chemical Identification
Steve Stein
Mass Spectrometry Data Center
National Institute of Standards and Technology
Gaithersburg, Maryland, USA
Evolution of the NIST MS Library
4000
3000
2000
1000
0
'88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99 '00 '01 '02 '03 '04 '05 '06 7 '08 09 '10 '11
Libraries Distributed/Year
Library Growth
Signal
Intensity
Retention Time
• Elemental Composition
(measurable by MS)
• Chemical Structure
(invisible to MS)
Reveal Structure as Spectrum:
A Mass “Fragmentogram”
mass = 140 u
H3C O H3C +
O
e- + HC O P CH3 HC O P CH3 + 2e-
H3C F F
H3C
+ CH2 H3C O
OH
HO P CH3 + CH
+
CH O P CH3 + CH3
H2C F
F
mass = 99 u mass = 125 u
Structure/Spectrum Space
examples of structures with similar spectra
Mass Spectra Reproducible Over Time
O’Neal et al.
Anal. Chem.
1951
NIST
2012
VX
HD
GB
Spectra Can be Interpreted, Not Predicted
MS
Interpreter
?
Library Search
• “Fingerprint” Identification
– Identify compound by matching spectrum to library
spectrum
Search Query
List Spectrum
Score
Histogram
Library
Hit Spectrum
List
Spectrum
List Query
Spectrum
Score
Histogram
Hit
Library
List
(Consensus)
Spectrum
90
80
Relative
70
Relative Probabilities
Abundance
60
Relative 50 s
50-100%
10-20%
Frequency 40
1-2%
30
20
10
0
0 10 20 30 40 50 60 70 80 90 100
m/z Difference
S.E. Stein and D.N Heller, J. Amer. Soc. Mass Spectrom. 2006, 17, 823-835.
Score Confidence Level
• How to Express Identification Certainty?
– Related to broad range of Identity problems
– Can it be quantitative?
• Follow Bayes
– Follow changes in confidence
• Bayesian Notation
– P ( ID is correct | Threshold Score )
Bayes Rule*
Reproducible
Spectrum
P ( Final
ID | Score ) P ( ID )
Starting P Change
( Scorein| ID)
X
P ( Confidence
FP | Score ) Confidence
P ( FP ) P Confidence
( Score | FP)
Prior Probability: False Positive
Influence of
Analyte is
Before Experiment Potential
Library Search
Identified Correctly
* Odds Version
I. Prior Probability
How plausible is the ID?
• Seen before under similar conditions?
• Expert knowledge
– Expected, plausible, unlikely, impossible
• Citations
– Google, ChemSpider, PubChem, MS Library, …
– Human Metabolite DB, Merck Index, ..
1000
800
600
400
200
Bars show
0 quartiles
0 50 100 150 200 250 300
Typical Interlab Spectrum Variation
Energy Dependence Collision
Energy
Gly-3_NGA2-200x-HCD-5to55 #967 RT: 6.09 AV: 1 NL: 1.44E5
Setting
T: FTMS + c NSI d Full ms2 678.22@hcd30.00 [100.00-1370.00]
678.2228
100
2+ 30
R elative A bu nda nce
80 [M+H+K]
60
40
576.6827
20
204.0864 495.6573 991.2909
343.5973 394.1171 626.1581 829.2382 931.2584 1151.1477 1253.5557
0
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300
m/z
Gly-3_NGA2-200x-HCD-5to55 #968 RT: 6.10 AV: 1 NL: 4.62E4
T: FTMS + c NSI d Full ms2 678.22@hcd35.00 [100.00-1370.00]
678.2226
100
Relative Abundance
80 576.6829
35
60
526.6601 695.3120
40
204.0863 495.6567
394.1174 991.2914
20 220.8554 343.5934
626.1591 829.2395
283.5729 769.2132 931.2676 1116.7522 1245.1434
0
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300
Gly-3_NGA2-200x-HCD-5to55 #969 RT: 6.10 AV: 1 NL: 2.09E4 m/z
T: FTMS + c NSI d Full ms2 678.22@hcd40.00 [100.00-1370.00]
138.0547
100
Relative Abundance
80
60
204.0863 394.1176
576.6827
40
283.5727
466.1385
40 495.6571 626.1594 991.2908
343.5940 678.2214 788.2122
20 728.1912 829.2380
931.2665
242.0537 1032.2496
0
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300
m/z
Ion Source Decomposition
OH
150 C
280 C
III. False Positive Potential
P( Score | FP )
• Wrong Compound ID – High Score
• MS Class Identification
– Different compounds yield same set of ions due to
structure similarity
MS Specific Class ID
MS Specific Class ID
Same low mass ions (benzoyl)
Random Match
False Positives above 800
Hit List Contains Structural Information
S.E. Stein "Chemical Substructure Identification by Mass Spectral Library Searching”, J. Amer.
Soc. Mass Spectrom. 6 (1995), 644-655.
Is Analyte In Library?
• Ideal library contains all plausible compounds
Vetiver Oil
Vetiver Oil
Many components not identified by GC/MS
NIST Library
Identified Manually
Not Identified
Unidentified Recurring Spectrum Library
99% - RI 1200
69% - RI 1860
57% - RI 2504
Unknown Knowns
Identified Known Knowns
Not expected but
by Library Expected and found
found