Professional Documents
Culture Documents
La
La
Source:http://urology.jhu.edu/research/img1/proteomics13.jpg
Preprocessing of MS data
zAlignment of the spectra
zFiltering (Denoising)
zBaseline subtraction
zNormalization
zPeak Detection
zClustering of peaks
zPeak quantification
SELDI-TOF-MS softwares
• ProteinChip Software 3.1
• SpecAlign
• Cromwell
• PROcess
• MassSpecWAvelet
PROcess package
• Process a single spectrum
• Process a set of spectra
Process a single spectrum
• Baseline subtraction
• Peak detection
Baseline subtraction
zPurpose: To level off the elevated, non-constant
baseline caused by the chemical noise in the
EAM and by ion overload, thus, make different
spectra compatible.
zSolution: Using local regression to estimate the
bottom of a spectrum and then subtracting that
estimate from a spectrum
zTwo approaches: Fitting local regression to:
zThe points below a certain quantile
zLocal minima: yields better results when
estimating the baseline.
Baseline Subtraction
Baseline subtraction: algorithm
zFor each spectrum, find local minima by
segmenting the m/z range.
zFit a local regression to local minima for each
spectrum
zSubtract the estimated baseline from each
spectrum
### Load libraries
library(survival)
library(Icens)
library(PROcess)
f1 <- read.files(fs[1])
title(basename(fs[1]))
dev.off()
title(basename(fs[1]))
dev.off()
Peak detection
• Purpose: To detect peaks that represent the set
of proteins that are differentially expressed
between different samples.
Peak Detection: algorithm
zSmooth the spectrum using moving averages of ks
nearest neighbors
zCompute local variability as the median of the
absolute deviations of kv nearest neighbors.
zIdentify local maxima of the smoothed spectrum
using three thresholds:
z The signal to noise ratio: local smooth/local variability
z The detection threshold for the whole spectrum
z The shape ratio: the area under the curve within a small
distance of a peak candidate/ maximum of all such
peak areas of a spectrum
### Peak detection
dev.off()
specZoom(pkgobj, xlim=c(5000,10000))
dev.off()
Peak detection
Processing a set of calibration
spectra
• Apply baseline subtraction
• Normalize spectra
• Cutoff selection
• Identify peaks
• Quality assessment
• Get proto-biomarkers
Example Data Set
• A set of 8 spectra from a calibration data set
– Same 5 proteins are present in the sample:
1084, 1638, 3496, 5807, 7034 amu
### Read in the 8 spectra
### Plot 8 spectra and mark the protein positions by red vertical lines for each of them
par(mfrow=c(2,4))
x <- read.files(f)
abline(h=0, col="gray")
abline(v=amu.cali, col="salmon")
if(lab.cali)
lines(x)
return(invisible(x))
i <- seq(along=files)
dev.off()
Baseline subtraction
• Similar to baseline subtraction for a single
spectra
• R code:
Mcal <- rmBaseline(dir.cali, plot=TRUE)
head(Mcal)
060503peptidecalib_1_128.csv 060503peptidecalib_1_16.csv
par(mfrow=c(1,1))
dev.off()
QualRes
par(mfrow=c(2,4))
x <- plotCali(...)
dev.off()
Analyze the result
• 5 known proteins: 1084, 1638, 3496, 5807,
7034
• Obtained 4 proto-biomarkers: 2906, 3498,
5812, and 7036
• Within 0.3% of m/z values of known proteins:
3498, 5812, and 7036
• Result of larger proteins with two charges:
2x2906 (5807) and 2x3496 (7034)
• Failed to detect peaks at m/z=1084 and 1638
Summary
• PROcess package:
– Process SELDI-TOF-MS data
– Advantage: produce more producible results
regarding peak quantification
– Limitation: The results were not homogeneous
across laser intensities