Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Multiple Instance Ranking

Data Base consist of small drug-like molecules.


Features (Descriptors) are computed for each Hydrogen atom
Hydrogen's are groups into sites of Metabolism .
for each molecule preferred site of Metabolism is known.
it is not known which hydrogen actually get abstracted.

Top-Level

Chemistry
For each Molecule find the site
of Metabolism
Molecules

Machine Learning
For each box find the preferred
bag
Boxes

Middle Level

Sites of Metabolism

Bags

Items

Hydrogen Atoms

Items

Problem Statement

The Need for MIRank


Dataset Particularities
Descriptors are known for each item
Each box has exactly one preferred bag
It is not known how other bags within a box rank
with respect to each other.
It is not known how bags compare against each
other across boxes
Boxes may be very different form each other

Machine Learning Consequences


This is multiple instance problem
This is partial ranking problem within each box

Hard problem

Dataset
# of Molecules (Boxes)
# of Sites of Metabolism (Bags)
# of Hydrogen/Non-Hydrogen Atoms (Items)
# of features per item (Charge, topological Properties, etc...)

Output information consist of one preferred bag per box

Drug-> Absorption->Distribution->Biotransformation(Metabolism)-> Excretion


Metabolism-> Enzymatic conversion of one chemical compound (Drug) to
another.
Primarily take place in liver (Primary site of metabolism is liver)

Enzymes in liver can activate/ Inactivate /make drugs more effective increase or
decrease drug Toxicity convert drugs into different forms so that kidneys can
easily excrete .
Pil->Breaks down to molecules->goes to large intestine and liver ->gets attack
by enzyme ->successful drugs goes out unsuccessful attacked by enzymes
Real damage is done using Hydrogen atom abstraction.(removal of hydrogen
atom).
Drug Molecule->enzyme->Hydrogen Removal.

Output is known for group of Hydrogen atom which hydrogen atom gets
abstracted we don't know
that's why it is MIRank problem.

Goal:- is to ,build a model that predicts , for each molecule the site of abstraction
of hydrogen atom during metabolism .
Individual hydrogen atoms are first place together according to molecular
equivalence .

Dataset do not show which individual hydrogen is abstracted during Metabolism


but rather to which group this hydrogen atom belongs
Molecules->Boxes
Groups->bags
and individual hydrogen atoms

Attempts made-> Singh et al. (2003)


Sheridan et al.(2007).

You might also like