Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Identification / Dereplication of

Natural Products by LC-UV-MS

Deborah L. Zink, Claude Dufresne, Jerrold Liesch, Jesus Martin


Merck & Co. Inc. Rahway, NJ

1
Objective
LC-MS-DAD Program in Natural Products Discovery

z Find known actives in crude extracts at low


concentration (dereplicationÆ match known
compounds within a given assay )
z Identify previously observed compounds in semi-
purified broths
z Provide initial characterization of unknown
components
– With the addition of High Resolution LC-MS
analysis we can identify possible matches in the
literature.

2
Why is this critical?

z Reduce discovery timeline


– By identifying known compounds, at low levels, in complex
mixtures, in a semi high throughput manner
z Reduce cost
– Can be done on screening sample
– Reduces number of re-growths
z Expedite meaningful discovery
– Allows for prioritization of biology and chemistry assets
z Final Results
– Faster discovery of a novel hot compound!

3
Analytical Process and Data Analysis - MS-Gold

z A standard analytical 10 minute LC-MS gradient


system
z Data processing tool (MS-Gold) that extracts and
combines Rt, UV, MS data and then stores the data
in a searchable library of component Rt, UV, MS and
structure.
– The database: tracks sample information, stores
raw spectral data, provides interpretation
information, and structural assignments
– This application is written using Visual Basic 6.0
and SQL Server 2000

4
MS-Gold Database Structure
Chemist Input
Injection UV Trace
Logger Injection ID wavelength vs. intensity
Sample ID min and max. raw values
data file name
Analysts
Sample Table data file location
Selects samples
MS interpretation
Generates sequence table + and - TIC traces
Sample ID LC method
Run MSD m/z vs. intensity
Organism Chro Scale
Assay TIC Scale Min and max. raw values
Chemist Name Injection date
Chemist Component detection:
Comment combine info from Component UV
- MSD Report for UV Background subtracted
- AMDIS for + / - MS Wavelength vs. intensity
- Create Component list
Components
Compound Component ID Component MS (+/-)
Compound ID Matches Injection ID m/z vs. intensity
Name Component ID RT observed Amdis extracted
Structure Compound ID Rt. corrected Only non-zero data
exactmass Record type UV scale / offset
creation date MS scale / offset MS Interpretation
CAS # Comments Potential masses
Nominal mwt Rule used
Score

Chemist Data Mining Analysts

Sample Viewer Generate reports e-mail to chemist


5
Data Extraction Procedures

z MS Data is extracted using AMDIS (Automated Mass Spectral


Deconvolution & Identification System) developed at NIST*
– This program is an integrated set of procedures for
extracting “pure” component spectra from complex
chromatograms
– It is integrated with Visual Basic using NIST AMDIS dll’s
z UV peaks
– Peaks are detected by integration in the acquisition
instrument software: Agilent ChemStation
– Background subtraction of the average baseline determined
at the beginning and end of the chromatogram produces
“clean” UV spectra

*Mallard and Stein, J. Am. Soc. Mass Spectrometry 10, 170-781 (1999)

6
Identifying Components and Combining Data

z A List of components are created from 3 sources


– All UV210 peaks as detected in Agilent ChemStation
– All MS peaks as reported by AMDIS for both positive and
negative ion modes
z Data is automatically captured for each component. If no UV210
peak is integrated, a UV spectrum is extracted for each AMDIS
component (provided it has significant intensity)
z Data is combined for a each component based on the set of
components that appear within a small time window
z UV and MS Rt’s are standardized for system offset and
corrected based on an external standard

7
Screen Shot of MS-Gold Data Processing

8
Spectral Library Setup

z The library contains identification information and


characteristic analytical data for all samples /
components analyzed.
z When the structure of a component is identified, it is
flagged as a fingerprint record. The molecular
structure (Molfile), trivial name and registration
information are attached.
z Compounds that are not fully characterized by NMR
are given a “tentative” assignment.

9
Screen Shot of Library ID Component

10
MS Spectral Compare Algorithm

z Mass spectra are compared using the Normalized Dot


Product^2 or Cosine^2
2
 n

 ∑ (LiU i )
 i 
n n

∑ Li ∑U i
2 2

i i

z L = library intensity, U = unknown intensity, Inputs are Scaled


0 to 1, Output: Best =1 Worst = 0
z No weighting is used for ESI data

*Steven E. Stein, Donald R. Scott; J. Am. Soc Mass Spectrom, vol. 5, pg. 859 (1992).

11
UV Spectral Compare Algorithm

Full curve matching using:


z Absolute Value Distance
z Scaled to values between 0 and 1
1
n

∑ (L − U )
i i
1+ i
n

∑U
i
i

z L = library intensity, U = unknown intensity, Inputs are Scaled


0 to 1, Output: Best =1 Worst = 0
z This works but it is the weakest algorithm

12
Retention Time and Molecular Weight Rating

z Retention time
– A Gaussian function is used to rate the
“closeness” of 2 retention times and scaled 0-1
z Molecular weight
– If Mwt of unknown matches the library component,
then Score = 100
– If Mwt of unknown does not the library
component, then Score = 0
– If Mwt of unknown is +/- 14, 16, 18, 32 of library
then Score = 50

13
Composite Rating Functions

z Only knowns that are within +/- 1 minute of the


unknown Rt are scored
z Composite rating =
1(Rt) +20(MSpos)+10(MSneg)+20 (UV)+20(Mwt)
1+20+10+20+20
– The weightings were determined empirically
z The hits are ranked by the composite score

14
Searching of Extracted Data by End Users

Extracted data
Library data

15
Batch search is used to process sets of data

Depending on the application, thresholds and specific


libraries maybe changed.
16
Identification of Hits from List

17
Reporting of matches to End Users
-PDF File format

18
Current Operations

z Agilent MSD-1100 running full time that supports:


– Dereplication in well characterized assays
– Screening for knowns in new assays
– Characterization of unknown components
z Limitations
– Mass range only to 1500
– No high resolution data

19
Next step: Move to LTQ-FT

z With exact mass, identification is expedited through


internal and external database searches
z Quicker identification of analogs with mass defect
analysis
z Extended mass range (up to 2000)
z Greater sensitivity
z MS-MS analysis

20
High Resolution MS Approach

z Current
– Manual data extraction and interpretation

– Batch searching for a limited number of components with


single ion plots at exact mass
z Future challenges
– Ideally, move Gold to high resolution platform (time and
programming intensive)

– Automate Exact mass/molecular formula determination with


literature searches
– Determine what data is sufficient to make an identification.
How much data is enough – Exact Mass ? RT? UV? All?

21
Acknowledgments

z Natural product chemists and biologists in the natural


products discovery programs for providing such
interesting problems to solve

22
Standard LC-MS method

Instrumentation
z Instrument: Agilent MSD with 2-
plate auto sampler
z Full diode array UV scanned
from 200 to 900 nm in 4 nm
steps @ 0.25 sec/scan
z Full scan MS from m/z 150 to
1500 @ 1 scan every 0.77
seconds. MS voltages are LC-conditions
switched in alternating scans z Column: Zorbax SB-C8, 2.1x30 mm; Temp: 40oC;
generating Pos.-ESI data Flow rate 300 µl /min;
followed by Neg.-ESI data thus z Solvents: A =10% acetonitrile / 90% water with 1.3
mM trifluoro-acetic acid and ammonium formate B
giving a cycle time of 1.54 sec = 90% acetonitrile / 10% water with 1.3 mM
ESI = electrospray ionization trifluoro-acetic acid and ammonium formate;
z Gradient::10% B to 100% B in 6 min, hold 2 min,
initialize 2 min

23

You might also like