Session IV Presentation I Zink

Identification / Dereplication of
Natural Products by LC-UV-MS
Deborah L. Zink, Claude Dufresne, Jerrold Liesch, Jesus Martin

Merck & Co. Inc. Rahway, NJ
1
Objective
LC-MS-DAD Program in Natural Products Discovery
z Find known actives in crude extracts at low

concentration (dereplicationÆ match known
compounds within a given assay )
z Identify previously observed compounds in semi-
purified broths
z Provide initial characterization of unknown
components
– With the addition of High Resolution LC-MS
analysis we can identify possible matches in the
literature.
2
Why is this critical?
z Reduce discovery timeline

– By identifying known compounds, at low levels, in complex
mixtures, in a semi high throughput manner
z Reduce cost
– Can be done on screening sample
– Reduces number of re-growths
z Expedite meaningful discovery
– Allows for prioritization of biology and chemistry assets
z Final Results
– Faster discovery of a novel hot compound!
3
Analytical Process and Data Analysis - MS-Gold
z A standard analytical 10 minute LC-MS gradient

system
z Data processing tool (MS-Gold) that extracts and
combines Rt, UV, MS data and then stores the data
in a searchable library of component Rt, UV, MS and
structure.
– The database: tracks sample information, stores
raw spectral data, provides interpretation
information, and structural assignments
– This application is written using Visual Basic 6.0
and SQL Server 2000
4
MS-Gold Database Structure
Chemist Input
Injection UV Trace
Logger Injection ID wavelength vs. intensity
Sample ID min and max. raw values
data file name
Analysts
Sample Table data file location
Selects samples
MS interpretation
Generates sequence table + and - TIC traces
Sample ID LC method
Run MSD m/z vs. intensity
Organism Chro Scale
Assay TIC Scale Min and max. raw values
Chemist Name Injection date
Chemist Component detection:
Comment combine info from Component UV
- MSD Report for UV Background subtracted
- AMDIS for + / - MS Wavelength vs. intensity
- Create Component list
Components
Compound Component ID Component MS (+/-)
Compound ID Matches Injection ID m/z vs. intensity
Name Component ID RT observed Amdis extracted
Structure Compound ID Rt. corrected Only non-zero data
exactmass Record type UV scale / offset
creation date MS scale / offset MS Interpretation
CAS # Comments Potential masses
Nominal mwt Rule used
Score
Chemist Data Mining Analysts
Sample Viewer Generate reports e-mail to chemist

5
Data Extraction Procedures
z MS Data is extracted using AMDIS (Automated Mass Spectral

Deconvolution & Identification System) developed at NIST*
– This program is an integrated set of procedures for
extracting “pure” component spectra from complex
chromatograms
– It is integrated with Visual Basic using NIST AMDIS dll’s
z UV peaks
– Peaks are detected by integration in the acquisition
instrument software: Agilent ChemStation
– Background subtraction of the average baseline determined
at the beginning and end of the chromatogram produces
“clean” UV spectra
*Mallard and Stein, J. Am. Soc. Mass Spectrometry 10, 170-781 (1999)
6
Identifying Components and Combining Data
z A List of components are created from 3 sources

– All UV210 peaks as detected in Agilent ChemStation
– All MS peaks as reported by AMDIS for both positive and
negative ion modes
z Data is automatically captured for each component. If no UV210
peak is integrated, a UV spectrum is extracted for each AMDIS
component (provided it has significant intensity)
z Data is combined for a each component based on the set of
components that appear within a small time window
z UV and MS Rt’s are standardized for system offset and
corrected based on an external standard
7
Screen Shot of MS-Gold Data Processing
8
Spectral Library Setup
z The library contains identification information and

characteristic analytical data for all samples /
components analyzed.
z When the structure of a component is identified, it is
flagged as a fingerprint record. The molecular
structure (Molfile), trivial name and registration
information are attached.
z Compounds that are not fully characterized by NMR
are given a “tentative” assignment.
9
Screen Shot of Library ID Component
10
MS Spectral Compare Algorithm
z Mass spectra are compared using the Normalized Dot

Product^2 or Cosine^2
2
 n

 ∑ (LiU i )
 i 
n n
∑ Li ∑U i
2 2
i i
z L = library intensity, U = unknown intensity, Inputs are Scaled

0 to 1, Output: Best =1 Worst = 0
z No weighting is used for ESI data
*Steven E. Stein, Donald R. Scott; J. Am. Soc Mass Spectrom, vol. 5, pg. 859 (1992).
11
UV Spectral Compare Algorithm
Full curve matching using:

z Absolute Value Distance
z Scaled to values between 0 and 1
1
n
∑ (L − U )
i i
1+ i
n
∑U
i
i
z L = library intensity, U = unknown intensity, Inputs are Scaled

0 to 1, Output: Best =1 Worst = 0
z This works but it is the weakest algorithm
12
Retention Time and Molecular Weight Rating
z Retention time
– A Gaussian function is used to rate the
“closeness” of 2 retention times and scaled 0-1
z Molecular weight
– If Mwt of unknown matches the library component,
then Score = 100
– If Mwt of unknown does not the library
component, then Score = 0
– If Mwt of unknown is +/- 14, 16, 18, 32 of library
then Score = 50
13
Composite Rating Functions
z Only knowns that are within +/- 1 minute of the

unknown Rt are scored
z Composite rating =
1(Rt) +20(MSpos)+10(MSneg)+20 (UV)+20(Mwt)
1+20+10+20+20
– The weightings were determined empirically
z The hits are ranked by the composite score
14
Searching of Extracted Data by End Users
Extracted data
Library data
15
Batch search is used to process sets of data
Depending on the application, thresholds and specific

libraries maybe changed.
16
Identification of Hits from List
17
Reporting of matches to End Users
-PDF File format
18
Current Operations
z Agilent MSD-1100 running full time that supports:

– Dereplication in well characterized assays
– Screening for knowns in new assays
– Characterization of unknown components
z Limitations
– Mass range only to 1500
– No high resolution data
19
Next step: Move to LTQ-FT
z With exact mass, identification is expedited through

internal and external database searches
z Quicker identification of analogs with mass defect
analysis
z Extended mass range (up to 2000)
z Greater sensitivity
z MS-MS analysis
20
High Resolution MS Approach
z Current
– Manual data extraction and interpretation
– Batch searching for a limited number of components with

single ion plots at exact mass
z Future challenges
– Ideally, move Gold to high resolution platform (time and
programming intensive)
– Automate Exact mass/molecular formula determination with

literature searches
– Determine what data is sufficient to make an identification.
How much data is enough – Exact Mass ? RT? UV? All?
21
Acknowledgments
z Natural product chemists and biologists in the natural

products discovery programs for providing such
interesting problems to solve
22
Standard LC-MS method
Instrumentation
z Instrument: Agilent MSD with 2-
plate auto sampler
z Full diode array UV scanned
from 200 to 900 nm in 4 nm
steps @ 0.25 sec/scan
z Full scan MS from m/z 150 to
1500 @ 1 scan every 0.77
seconds. MS voltages are LC-conditions
switched in alternating scans z Column: Zorbax SB-C8, 2.1x30 mm; Temp: 40oC;
generating Pos.-ESI data Flow rate 300 µl /min;
followed by Neg.-ESI data thus z Solvents: A =10% acetonitrile / 90% water with 1.3
mM trifluoro-acetic acid and ammonium formate B
giving a cycle time of 1.54 sec = 90% acetonitrile / 10% water with 1.3 mM
ESI = electrospray ionization trifluoro-acetic acid and ammonium formate;
z Gradient::10% B to 100% B in 6 min, hold 2 min,
initialize 2 min
23

Session IV Presentation I Zink

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Session IV Presentation I Zink

Uploaded by

Copyright:

Available Formats

Identification / Dereplication of

Natural Products by LC-UV-MS

Deborah L. Zink, Claude Dufresne, Jerrold Liesch, Jesus Martin

z Find known actives in crude extracts at low

z Reduce discovery timeline

z A standard analytical 10 minute LC-MS gradient

Chemist Data Mining Analysts

Sample Viewer Generate reports e-mail to chemist

z MS Data is extracted using AMDIS (Automated Mass Spectral

z A List of components are created from 3 sources

z The library contains identification information and

z Mass spectra are compared using the Normalized Dot

z L = library intensity, U = unknown intensity, Inputs are Scaled

Full curve matching using:

z L = library intensity, U = unknown intensity, Inputs are Scaled

z Only knowns that are within +/- 1 minute of the

Depending on the application, thresholds and specific

z Agilent MSD-1100 running full time that supports:

z With exact mass, identification is expedited through

– Batch searching for a limited number of components with

– Automate Exact mass/molecular formula determination with

z Natural product chemists and biologists in the natural

You might also like