ImmunoInformatics Discussion 1

Immunoinformatics: Bioinformatic Strategies for Better Understanding of Immune Function:
Novartis Foundation Symposium 254. Volume 254

Edited by Gregory Bock and Jamie Goode
Copyright  Novartis Foundation 2003. ISBN: 0-470-85356-5
General discussion I
Rammensee: I’d like to welcome discussion of general points regarding the papers
we have heard so far. What we can agree on is that we have a lot of data to work on,
but the connectivity between the data and databases needs to be optimized. But we
still don’t have enough data, and sometimes it is hard to generate hard data to put
into prediction models.
Kesmir: We started the day with a classi¢cation of immunoinformatics into two
branches: soft and semi-soft. How can we join these two di¡erent approaches? Is it
possible to combine the mathematical models and computer simulations of the
immune system with the sequence analysis-based work? We should discuss this.
Aminopeptidase activity that we have just been hearing about is a good example,
because these peptidases don’t have a great speci¢city, but it is a matter of how
much they can access the peptide. This gives the end result of the trimming. We
can only work out how long the peptides come into contact with peptidases if we
make mathematical models of how peptides are generated, how they are
transported to the endoplasmic reticulum (ER), how fast they bind to MHC and
so on.
Rammensee: I think the answer to your question is rather easy. The test of any
mathematical model is the experiment.
Kesmir: What I am saying is that with a mathematical model we can estimate how
much peptide can be exposed to aminopeptidase, and then we can include this into
our optimal epitope predictions.
Rammensee: This would be di⁄cult.
Bernaschi: Hans-Georg Rammensee, from what you say it sounds like it is simple
to test model predictions by experiments. My understanding is that it is very
di⁄cult to make a good experiment in immunology.
Rammensee: I wasn’t suggesting that the experiments are easy.
Bernaschi: If I want to know the half-life of a T cell, for example, I don’t have an
easy way to answer this question. There should be more interaction between people
in the lab and those making mathematical models. It is not as simple as one person
making a model and then this being tested by an experimenter. Often it is not
possible to make a good model because of a lack of speci¢c data. Other times
there are too many data, and it is di⁄cult to identify the right ones to use.
Slightly di¡erent approaches should be used to provide a methodology for
98
GENERAL DISCUSSION I 99
¢nding the right information required by people working on new models, and then
the model should be tested.
Rammensee: As I tried to say earlier, for some modelling there is no way to prove
its accuracy. For instance, modelling the half-lives of T cells. This belongs to the
‘soft’ branch of immunoinformatics.
Kesmir: Nonetheless, we can still use that estimate. It is better than just saying we
don’t know.
Brusic: There are deterministic and statistical questions that a researcher may ask.
Some questions can be answered by speci¢c experimental methods while others
need to be treated statistically. We know very well that before elections pollsters
take a sample of say 1000 voters and can predict the outcome of the election fairly
accurately. In biology we can do the same for certain problems, but we need to
know the limit of these predictions and how they can be applied. If we can do
direct experimental validation, then it is a deterministic problem. The
explanation of many deterministic measurements typically requires a statistical
approach.
Bernaschi: This raises another interesting point. What is the meaning of statistics?
If you consider that you are working in a ¢eld with 40 million people with a speci¢c
disease and you build your statistical models from just 10 people, does this make
any sense? The answer is no. But it is the only thing you can do in an experiment.
Rammensee: It depends on the di¡erence between the two groups, I guess, and
the type of experiment.
Lefranc: Nick Petrovsky, I was quite interested by your slide describing all the
steps needed for sharing experimental and clinical data between labs and clinicians.
At the end of your slide there were two headings. One was managing laboratory
information and the other was protecting clinical data. These are two areas where a
lot of work remains to be done. Could you comment on the kinds of standards to be
set up?
Petrovsky: What I was saying was that it is very hard to assess information
coming out of a lot of di¡erent laboratories. Although in their publications
people are meant to describe enough information in their methods for other
people to be able to reproduce those experiments, we all know this can be very
di⁄cult. This is mainly because if people said exactly what they did, the methods
section would end up being huge. Similarly, with clinical data the problem is that
they are generally very incomplete. If you go to any clinical databases they are very
much biased by the clinician and their ideas about the disease. In fact, if you talk to
two clinicians they may actually de¢ne the disease quite di¡erently. As we get
laboratory data, such as those provided by gene expression arrays, it will be hard
to match this precise information up with clinical information if the latter is
imprecise. We are saying that we need to ensure that both sets of information are
precise and standardized. At that point it will be possible to bring the two together.
100 GENERAL DISCUSSION I
People are currently trying to bring together expression data and disease data, and
one of the reasons they are ¢nding this problematic is that there is so much
imprecision and lack of standardization in both data sets. We need to try to
introduce standards and get some consistency in diagnostic criteria and clinical
attributes so that we can interpret the laboratory data in a more consistent way.
DeLisi: Expression arrays can achieve this: they can stratify diseases that were
previously thought to be the same disease. Particularly with neurological diseases
this is a serious problem. Imaging will help there, but one needs to ¢nd hard
phenotypic correlates of what is going on genetically.
Lefranc: To make the clinical data more precise, any available phenotypic,
serological or genetic markers related to a gene should be entered. At the
beginning of the 1980s when we sequenced the immunoglobulin IGHG and
IGHA genes, we cloned and sequenced genes from individuals for whom we had
previously analysed the familial pedigree and determined the Gm allotypes by
serological typing. In many cases, unfortunately, a lot of information was lost
because many labs which cloned and sequenced genes at that time were not
concerned by the genetic information and polymorphisms (serological, RFLP,
etc.) associated with their clone or phage sequences. Coming back to the clinical
side, what kind of standard information do you see for the future? What is the
minimum level of information that needs to be collected?
Petrovsky: If you are researching a disease from the laboratory viewpoint, you
need to ensure the clinicians you are working with are able to give you the
classi¢cation of disease that they are using. Increasingly, clinicians are trying to
agree on common diagnostic criteria. Someone who runs an assay has to be able
to reference it in order to publish it. If a clinician tells you that this is a group of
patients with a particular type of rheumatoid arthritis, you should demand a
reference for how they were classi¢ed as being in that particular subgroup. If
they can’t, then you have a problem. Many clinical groups have decided that they
need a system for classifying particular diseases and have developed internal
guidelines on classi¢cation. Once a group of experts has agreed on a system, then
everyone else generally eventually adopts it.
Kellam: This already exists in some diseases, for example the lymphomas and
leukaemias, which have international recognized standards for diagnosis and
classi¢cation.
Petrovsky: It is like annotation. As long as they can say which guidelines they are
using, people can then go back to the source and work out what they are dealing
with. As long as they can reference the source, that should be ¢ne. As scientists we
will have very noisy data if the person giving us the clinical samples we are using in
our studies isn’t classifying them according to some sort of de¢ned criteria.
Gulukota: In some of the microarray communities they use strict classi¢cations
such as that given in the commercial package SNOMED which has an ontological
GENERAL DISCUSSION I 101
classi¢cation for much of medicine i.e. pathologies, tissue anatomy, drugs etc.
Something of this sort could be developed in the open standards community in a
manner similar to Gene Ontology.
Kellam: I think SNOMED is designed to be this: the equivalent clinical
description to Gene Ontology.
Gulukota: The problem with SNOMED is that it isn’t free.
Kellam: There is an open-source equivalent available. There are some microarray
pages as well for clinical annotation. Again, it is di⁄cult to get everything
annotated retrospectively.
De Groot: The problem is that most of the clinical information is hand-written in
doctors’ scribble.
Petrovsky: SNOMED is more a dictionary than it is a set of diagnostic criteria for
each disease. It still leaves the diagnosis to the clinician in their individual
judgement, which is not annotatable, unless you annotate the name of the
diagnosing clinician!
Gulukota: I agree. Often clinical trials don’t just say what the disease is but also
have explicit inclusion and exclusion criteria. These are fairly rigorous. We might
need to have something like this in mind when we are investigating a particular
disease from a collaboration point of view.
Rammensee: Our task is to talk to the clinicians and tell them what kind of
information we need for our di¡erent purposes. I don’t think we can generalize
about the conditions which our clinical partners have to follow. We are not the
right people to do this.
De Groot: What will happen is that as they come to us for an explanation, we will
say they need to start collecting HLA data if you want us to explain why your
therapeutic proteins are causing side e¡ects, for example. There will be an
evolution and this will be important. I think also that there is an acceptance of
immunoinformatics which is key here. Vladimir Brusic pointed out that a few
years ago we were looking at a black box. There has been a change in the
acceptance of this technology so we are now in a position to start asking for some
better data.

ImmunoInformatics Discussion 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ImmunoInformatics Discussion 1

Uploaded by

Copyright:

Available Formats

Immunoinformatics: Bioinformatic Strategies for Better Understanding of Immune Function:

Novartis Foundation Symposium 254. Volume 254

You might also like