Professional Documents
Culture Documents
John M. Barnar IRFS 040610
John M. Barnar IRFS 040610
Dr John M. Barnard
Scientific Director
Digital Chemistry Ltd., UK
www.digitalchemistry.co.uk
Outline
2
Chemical structures in patents
3
Markush structures
of specific structures R2
R3 N
– allow protection of related molecules
R1
with common properties R 1 = phenyl / cyclohexyl / ...
– named after inventor involved in US R 2 = H / methyl / ...
R 3 = H / Cl / NO 2 / ...
legal case in 1924
4
Chemical structures in patents
Markush
structure
Specific
structure
name
5
Markush structures
Specific structures can be generated by combinatorial
assembly of alternatives for each R-group
Variable
multiplicity
Non-structural Generic
description groups Specific Variable-position
groups attachment
6
Substructure search
H3C O
O H3C
N C O
H3C
NH N
CH3 CH2 CH3
H3C N CH3
N N
N
N
Commercial search
systems Databases
12
Which way forward?
Derwent
Chemistry
Resource
CA Registry SureChem
IBM
Reaxys
MARPAT
Databases
Data-mining software
Markush
14 Structures
Using specific structures
Conventional approach
Extract specific structures from
patent Issues
– manual curation
Selection of compounds
• CA Registry
• Derwent Chemistry
– exemplified
Resource – "prophetic"
– automatic extraction – anything with a name
• SureChem
Effectiveness of automatic
• IBM
nomenclature identification
– combination of both and translation
• Reaxys
Correctness of systematic
Search using standard names in patent document
substructure search software
15
Using specific structures
18
In-house Markush systems?
Prospects
Software Databases
New Markush search systems Existing curated databases
under development – ThomsonReuters have
– Digital Chemistry Ltd. expressed interest in
– ChemAxon making MMS data available
– MARPAT database another
Also work on selective obvious possibility
enumeration of specific
"Home-grown" databases for
structures from Markush specialist purposes
– DecrIPt Inc.
– input software needed
Automatic extraction from patent
20 documents
Automatic Markush extraction
Currently a "hot area" for research, after a fallow period
– complex combined issues of text and image processing,
nomenclature translation and semantic analysis
Sheffield University Cambridge University
3 publications (1992-97), initially Unilever Centre for Molecular Informatics
analysing Derwent patent abstracts. Ongoing work by Murray-Rust group on
analysis of full-text patents, extending
CLiDE Pro (KeyModule Ltd.) OPSIN nomenclature translation program.
Work by A.P. Johnson (2009) extending
earlier chemical OCR software. chemoCR (Fraunhofer SCAI)
Recent work on prototype software for
ChemProspector (InfoChem) Markush "reconstruction" from patent
text, with limited success.
Ongoing research into extraction of
Markush structures from patents. Commercially-viable operational
systems probably still some way off.
21
Precision and recall
H 3C
R84 is a substituted or
N
O unsubstituted, mono-, di- or
polycyclic, aromatic or non-
N
CH3 matches aromatic, carbocylic or
N
heterocyclic ring system, or ...
24
TREC-CHEM
Dr John M. Barnard
Scientific Director, Digital Chemistry Ltd.
46 Uppergate Road, Sheffield S6 6BX, UK
john.barnard@digitalchemistry.co.uk
+44 (0)114 233 3170
27