Inferring Gene Regulatory Networks Using Heterogeneous Microarray Dataset

INFERRING GENE REGULATORY NETWORKS
USING HETEROGENEOUS MICROARRAY

DATASET
A PROJECT REPORT
Submitted by
S.P.SUGANYA DEVI 80705104097

L.SUGANYA 80705104098
U.SUGANYA 80705104099
in partial fulfillment for the award of the degree
of
BACHELOR OF ENGINEERING
in
COMPUTER SCIENCE AND ENGINEERING
J.J.COLLEGE OF ENGINEERING AND TECHNOLOGY,

AMMAPETTAI,TRICHIRAPPALLI-620 009
ANNA UNIVERSITY:: CHENNAI 600 025
MAY 2005
ANNA UNIVERSITY : CHENNAI 600 025
BONAFIDE CERTIFICATE
Certified that this project report “INFERRING GENE

REGULATORY NETWORKS USING HETEROGENEOUS
MICROARRAY DATASETS” is the bonafide work of
“S.P.SUGANYA DEVI L.SUGANYA AND U.SUGANYA”
who carried out the project work under my supervision.
SIGNATURE SIGNATURE
<<Name>> <<Name>>
HEAD OF THE DEPARTMENT SUPERVISOR
Assistant professor,
Dept.of Computer Science & Engg., Dept.of ComputeScience & Engg.,
J.J.College of Engg. & Tech., J.J.College of Engg. & Tech.,
Ammapattai, Ammapattai,
Tiruchirappalli-620009. Tiruchirappalli-620009.
ACKNOWLEDGEMENT
ABSTRACT
Inferring Gene Regulatory Networks (GRNs) is critical in describing

the intrinsic relationship between genes in the course of evolution and
discovering group behaviors of a certain set of genes. Recent development
on high-throughput technique, microarray, provides researches a chance to
monitor the expression patterns of thousands of genes simultaneously. While
increasing amount of microarray data sets are becoming available online, the
integration of multiple microarray data sets from various data sources (e.g.
different tissues, species, and conditions) for GRNs inference becomes very
important in order to achieve more accurate and reliable GRNs modeling.
This paper will review recent development on integrating multiple
microarray data sets and propose a new method to infer GRNs using using
multiple microarray data sets .
TABLE OF CONTENTS
CHAPTER NO. TITLE PAGE
ABSTRACT
LIST OF TABLE
LIST OF FIGURES
LIST OF SYMBOLS
1. INTRODUCTION
1.1 LITERATURE SURVEY

1.1.1
1.1.2
1.2
2. PROPOSED SYSTEM
2.1. MicroArray Dataset:

2.1.1 Formatting MicroArray Dataset
2.1.2 Loading MicroArray Dataset
2.1.3 Reading Gene values from MicroArray
2.2 Implementing Correlation Signature Method
2.2.1Capturing Landmark Genes
2.3. Modifying MicroArray Dataset
2.4. Implementing GRN SigCalc Method
2.5. Capturing Activator and Repressor Genes
2.6. Constructing Gene Regulatory Network
3. CONCLUSION
4. FUTURE WORK
APPENDICES
A.1 SOURCE CODE
A.2 SNAP SHOTS
REFERENCES
LIST OF FIGURES
FIGURE NO NAME OF THE FIGURE PAGE NO

LIST OF TABLES
TABLE NO NAME OF THE TABLE PAGE NO

LIST OF ABBREVIATIONS
1
CHAPTER 1
INTRODUCTION
Understanding the genetic causes behind phenotypic

characteristics of organisms is one of the most important objectives in
genetic researches. In other words, discovering the exact ways in
which genetic components, genes and proteins, i.e., interact to make a
complex living system. Recently, high-through put techniques, like
microarray , have greatly helped researchers obtain a closer look on
interactions between genes. Other than traditional genetic and
molecular approaches, which usually examine and collect data on a
single gene, microarray technique monitors the expression patterns of
tens of thousands of genes in parallel [2], [9]. Data collected with this
technique are noted as gene expression data and well-suited for both
qualitative and quantitative-level modeling and simulation.
The interaction between the genes can be illustrated with a

Gene Regulatory Network (GRN). A GRN contains a collection
of genes that interact with each other and can elucidate
the effect of the nature and topology of interactions on
the systemic properties of organisms. However, GRNs
constructed from single microarray data
2
sets are often hard to interpret and unreliable due to

the lack of enough samples.
The ability of integrating heterogeneous
microarray data sets is becoming very important and
desired by bioinformatics researchers to infer more
reliable GRNs with statistically robust models. After
reviewing recent efforts on this topic, this paper will
present a novel GRN inference method using the
SigCalc algorithm [6].
1.1 LITERATURE SURVEY
Singular Value Decomposition:

Wang et al. presented a method to find the most consistent
GRN structure with respect to all involved data sets [11]. This method
uses linear differential equations to describe a GRN, which is shown
in Equation
x(t) = Jx(t) + b(t), t = t1, . . . , tm
where J = ((Jij)n×n= ∂f(x)/∂x is an Jacobian matrix or

connectivity matrix and b = (b1, . . bn )T ∈ Rn is a vector
representing the external stimuli or environmental conditions. A
particular solution for Equation-1 can be derived for each data set
using Singular Value Decomposition (SVD) [1]. With the
3
consideration of the sparse structure of GRNs, the inference of GRN

is formulated as an optimization problem with an objective function of
forced matching and sparsity terms for multiple data sets. Here forced
matching means forcing the final solution of J to match with the SVD
solution whereas sparsity means the matrix J should be sparse (In
other words, most of the elements in J should have zero values). The
experimental results showed that the generated GRNs
were promising and biologically meaningful [11].
Evolving Connectionist System (ECOS):

Goh and Kasabov utilized the Evolving
Connectionist System
(ECOS) to integrate multiple data sets [5]. An ECOS is a
neural network that can continuously adjust its
structure through interacting with its environment and
other systems. The system will evolve along with the
incoming information that has unknown distribution.
ECOS uses an objective function to optimize the system
performance over time. In terms of modeling, ECOS
allows new data to be added in an incremental way so
that the connectionist systems can be built for online
adaptive learning, where new data from various sources
can be added into the system. In addition to applying
the ECOS model to all the data sets, Goh and Kasabov
also conducted normalization on all data sets to achieve
better results.
Clustering:
Filkov and Skiena proposed a method to combine
microarray data from various experiments on an equal
basis using the concept of consensus clustering [4].
Clustering technology has been very popular and widely
applied in analyzing biological data [3] for years. There
are many existing clustering results for the same
organism available in public repositories today.
clusterings of the same genes can be used to extract
more information about the groups that the genes co-
belong to than the individual clusterings themselves.
Consensus clustering isan algorithm that is based on
the various, source-specific (for the same organism)
clusterings of the data (or the meta-data) to both provide
an integrated view of the data and eliminate misclassifications due to
errors in the individual data sets. Mathematically, clusterings are set
partitions and therefore consensus clustering algorithm can be
formalized as a set partition problem. Given n partitions, p1, p2, . . . ,
pn, and the symmetric difference distance on any two partitions, find a
consensus partition p that minimizes
D =n
_ i=1d(pi, p).
This is denoted in the literature as the median partition problem

and has been proved to be NP-Complete. Filkov and Skiena provided
5
three heuristics to find this consensus cluster and demonstrated the

algorithm’s efficiency with both clean and noise-contained data
experimentally.
PROPOSED SYSTEM:
Formatting MicroArray Dataset:

A typical microarray experiment will have tens of samples
while each sample contains thousands of genes expression data. The
asymmetry between the number of samples and the number of genes
is called the ”curse of dimensionality” and often causes problems in
statistical data-processing. On the other hand, increasing amount of
microarray data is being generated daily. Data collected from
microarray experiments is usually heterogeneous. That is, the data is
often from different tissues, treatment strategies, stages of disease
development, and conducted in different labs that apply different
microarray technologies and protocols. The integration of different
microarray data sets is creating a new challenge to bioinformatics
researchers. The Dataset has to be formatted in excel sheet. The
column heading denotes the different sample and row heading denotes
the different number of genes.
Loading MicroArray Dataset:

The MicroArray Dataset is loaded into the system by choosing
the file at run time.
Gene values from MicroArray Dataset:

The system reads the gene values from input dataset and stores it
temporarily for future processing.
Table 1.1 A Typical Microarray Example

S1 S2 S3 S4
G1 0.73 0.88 0.69 0.71
G2 0.80 0.75 0.71 0.82
G3 0.01 0.05 0.09 0.03 *L1*
G4 0.37 0.32 0.41 0.35
G5 0.91 0.85 0.83 0.87
G6 0.02 0.07 0.12 0.08 *L2*
G7 0.76 0.87 0.92 0.95
G8 0.75 0.84 0.77 0.89
G9 0.92 0.86 0.84 0.96
G10 0.14 0.03 0.06 0.16
Implementing Correlation Signature Method:

Kang et al. presented a correlation-based algorithm, Sig-Calc,
to provide a new interpretation of gene expression data and to
integrate heterogeneous microarray data sets [6]. In this algorithm,
they first defined the concept of correlation signature. The correlation
signature is used to capture the correlations between a gene and a set
of landmark genes. Different methods can be used to choose the
landmark genes, for example, the genes from a particular pathway or
being referred in literatures as associated with a certain disease with
high probability, such as lung cancer. The correlations are defined to
be the similarities and dissimilarities between gene vectors (rows in
Table-1.2).
Any convenient distance metric can be used in the calculation

of correlations, e.g. Euclidean distance, Cosine correlation, Pearson
correlation, and Mean-Expression distance. The selection of a proper
distance metric is based on the application environment [6]. For
instance, although Euclidean distance is a popular method to measure
the distance between two vectors, it fails to capture the natural bias of
gene expression data. Thus if we focus on the fluctuation of the
expression levels rather than the absolute expression values,
8
correlation metrics may be used to achieve more accurate results than
does Euclidean distance.
All correlation signature values form a vector. This vector is called

gene signature vector. The expression level of a gene can then be
represented by its correlation to a set of landmark genes. A typical
microarray data set is usually represented by a matrix. The rows are
the measurements associated with individual genes while the columns
are the measurements associated with the samples. Each Entry
represents the expression level of one gene of a sample. Typically, an
asymmetric relationship exists between the number of genes and
samples, i.e., the number of genes (in thousands) is much larger than
the number of samples (in tens).
Capturing Landmark Genes:

SigCalc assigns a correlation signature to each gene in a
microarray data set. Without loss of generality, G3 and G6 in Table-I
are the selected landmark genes, L1 and L2 based on their average
value. The correlation signature values for the genes are shown in
Table 1.2. represents the signature vector. For example,
We also list the average values of the correlation

signature for each gene in last column. The Pearson
9
correlation is used for this calculation. This popular metric measures
the tendency of two vector of variables to increase or decrease
together. Its mathematical definition is described as follows.
where and represent two gene row vectors in our context.

For the dissimilarity measure, one just simply changes the correlation
form to be In this paper, we use the same correlation
distance described in [6], for the correlation
signature calculation. This correlation distance has a range of [0, 1],
and a distance close to zero implies and are correlated while a
distance close to one implies the two vectors are inversely correlated.
There is no correlation between the two vectors if the value of the
correlation distance is 0.5. We can interpret the value using the
regulation rule between genes. For instance, Sig(G7) = [0.21, 0.08]
may imply that G7 is activated by the landmark genes G3 and G6. On
the other hand, Sig(G9) =[0.90, 0.75] may imply that G9 is repressed
by G3 and G6.
10
Table 1.2 The Correlation Signature
L1(A) L2(A) Average

Sig(G1) 0.54 0.61 0.59
Sig(G2) 0.95 0.83 0.89
Sig(G3) 0 0.04 0.02
Sig(G4) 0.26 0.30 0.28
Sig(G5) 0.97 0.97 0.97
Sig(G6) 0.04 0 0.02
Sig(G7) 0.21 0.08 0.14
Sig(G8) 0.56 0.39 0.47
Sig(G9) 0.90 0.75 0.82
Sig(G10) 0.85 0.72 0.78
Modifying MicroArray Dataset:

We rank the genes in Table-1.2 based on the average column in
descending order. Since a value of 0.5 represents no correlation
existing between a gene vector and the landmark genes (in other
words, the landmark genes do not activate or repress the gene), we
consider a threshold parameter θ = 0.1 to exclude the genes that have
11
the average value near 0.5 ±θ. The modified gene expression vectors
are shown in Table-1.3
1.3 The Modified MicroArray Data

S1 S2 S3 S4
G2 0.80 0.75 0.71 0.82
G3 0.01 0.05 0.09 0.03 *L1*
G5 0.91 0.85 0.83 0.87 *L3*
G6 0.02 0.07 0.12 0.08 *L2*
G7 0.76 0.87 0.92 0.95
G9 0.92 0.86 0.84 0.96
G10 0.14 0.03 0.06 0.16
Implementing GRN SigCalc Method :

Given k microarray data sets, our goal is to construct a GRN
which contains all the activate and repress relationships between the
genes and the landmark genes. Without loss of generality,
12
let mi(n × m), i = 1, . . . , k represent the microarray data sets,

where n is the number of genes and m is the number of samples. We
concatenate the k data sets to obtain a bigger matrix M(n × r) where r
= m × k. The gene vectors (row vectors) can be represented as G =
{g1, g2, . . . , gn}. Also let L = {l1, l2, . . . , lt} represent the initial
landmark gene set. The correlation signatures between each gene and
the landmark gene li can be represented as SIG = {Sig(g1,li ),
Sig(g2,li ), . . . , Sig(gn,li )}, where i = 1, . . . , t 1 Also we have the
follow notations:
Avg(Sigi) : the average Signature value for gi

AvgAct(Sigi) : the average Signature value between gi
and the activators
AvgRep(Sigi) : the average Signature value between gi
and the repressors
MAX(a, b) : the maximum value between a and b
Diff(a, b) : the difference between a and b
1. do
2. for each gi in G
3. for each lj in L
4. calculate Sig(gi, lj)
13
5. end for
6. calculate Avg(Sigi)
7. if Avg(Sigi)±0.5 < θ
8. remove gi from G
9. end if
10. end for
11. for each gi in G
12. calculate AvgAct(Sig(gi))
13. calculateAvgRep(Sig(gi))
14. ifDiff(AvgAct(Sig(gi)),AvgRep(Sig(gi))) < δ
15. remove gi from G
16. end if
17. end for
18. add gx withMAX(Diff(Avg(ACT),Avg(REP))) to L
19. if activators activate gx
20. add gx to ACT
20. else add gx to REP
21. end if
22. add gx to the GRN with the incoming lines from ACT
23. and outgoing lines to elements in REP set
24. until elements in M have been all processed
14
Capturing Activator and Repressor Genes

We then select the gene with the highest rank score, G5, in this
example, and add it to the landmark gene list and mark it as an
repressor. Note in this particular example, other than G3 and G6,
which are already in the landmark gene list, we also have two ranking
scores (for G4 and G7) smaller than 0.5±θ. G4 and G7 are activators.
Therefore, we need to choose the lowest in these scores (1−0.14 =
0.86 for G7 in this case) and compare 0.86 with the highest ranking
score (0.97 for G5) in the repressors and select the one with the higher
value (G5) to be the new landmark gene. The assumption behind the
scene is that the one that is most activated/repressed will be selected
to be the next landmark gene, either being an activator or a repressor.
With the new landmark gene G5, we have a new Correlation
Signature Table-1.4
In Table-IV. L1(R) and L2(R) means L1 and L2 are repressors

while L3(A) means L3 is an activator. Average(R) represents the
average for all the repressors while Average(A) represents the average
for all the activators. Diff represents the difference between
Average(A) and Average(R).
15
Table 1.4 The Correlation Signature with a New Landmark gene
L1(R) L2(R) L3(A) Average(R) Average(A) Diff

Sig(G2) 0.95 0.83 0.11 0.89 0.11 0.78
Sig(G3) 0 0.04 0.97 0.02 0.97 0.95
Sig(G5) 0.97 0.97 0 0.97 0 0.97
Sig(G6) 0.04 0 0.97 0.02 0.97 0.95
Sig(G7) 0.21 0.08 0.87 0.14 0.87 0.72
Sig(G9) 0.90 0.75 0.15 0.82 0.15 0.67
Sig(G10) 0.85 0.72 0.14 0.78 0.14 0.64
We will select the gene with the highest Diff value to be the
next landmark gene (except the ones that are already in the landmark
gene set). This means we select the one that is mutually regulated by
the activating genes and repressing genes to the greatest degree. In
this example, it is G2.
16
Constructing Gene Regulatory Network
We continue this procedure until all the genes have either been
added to the landmark gene set or excluded from the working
microarray data set. At the end of the loop, we obtain a gene
activating/repressing relation graph. This is illustrated in Figure-1
where solid lines and dashed lines represent activate and repress
relationship, respectively. Note that we can use δ as a threshold for
Diff to limit the final GRN size. That is, we can reduce the GRN size
by removing the genes that have a Diff value under the threshold
requirement in each loop.
1
7
GRN-SigCalc can be verified by constructing GRNs from training
data sets and test the activating/repressing relationship against the testing
data sets. When multiple data sets are available, we can split these sets to
several groups and construct GRNs separately
CHAPTER 3
CONCLUSION
This project is used to construct the GRNs using microarray data sets
from heterogeneous data sources. Since the microarray datasources are
obtained from different organisms and have different sample backgrounds,
quality control standard and microarray technologies etc., the integration of
the data is a difficult task. On the other hand, due to the curse of
dimensionality, GRNs constructed from a single microarray experiment are
often not convincing and lack robust statistical basis. Although the data from
different sources may vary greatly, presumably, the relationship between the
genes will remain.
GRN-SigCalc integrates multiple microarray data to construct GRNs

using the correlation signature concept. In addition, this method utilizes the
sparsity feature of GRNs to remove the genes that are not highly correlated
with other genes from the network. The resulting GRNs are compact and can
represent both activate and repress relationships. The size of GRNs can be
further reduced by increasing the threshold value θ and δ in the algorithm.
GRN-SigCalc can be verified by constructing GRNs from

training data sets and test the activating/repressing relationship
against the testing data sets. When multiple data sets are available, we can
split these sets to several groups and construct GRNs separately.
FUTURE WORK:
The GRN-SigCalc method can be improved by importing
the resulting GRNs to a neural network and evolve the neural
network to achieve higher accuracy. This is especially useful
when noise in the data sets is unavoidable.
21
APPENDICES
A.1SOURCE CODE
SigcalcFirstServlet.java
import java.io.*;
import java.lang.reflect.Array;
import java.net.*;
import java.util.ArrayList;
import javax.servlet.*;
import javax.servlet.http.*;
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.*;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
/**
*
* @author
* @version
*/
public class SigcalcFirstServlet extends HttpServlet {
static int len=0;
static int no_of_row=0;
/** Processes requests for both HTTP <code>GET</code> and
<code>POST</code> methods.
* @param request servlet request
* @param response servlet response
*/
protected void processRequest(HttpServletRequest request,
HttpServletResponse response)
throws ServletException, IOException {
response.setContentType("text/html;charset=UTF-8");
PrintWriter out = response.getWriter();
//String FileName="C:/Documents and

Settings/mekala.FOCUS/Desktop/Book1.xls";
String FileName=request.getParameter("fname");
int sno=Integer.parseInt(request.getParameter("sno"));
sno=sno-1;
try
{
HttpSession ses=request.getSession();
ses.setAttribute("FILENAME",FileName);
ses.setAttribute("SNO",String.valueOf(sno));
InputStream in= new FileInputStream(FileName);
POIFSFileSystem fs=new POIFSFileSystem(in);
HSSFWorkbook wb=new HSSFWorkbook(fs);
HSSFSheet sheet=wb.getSheetAt(sno);
ArrayList al=new ArrayList();
int no_of_rows=sheet.getPhysicalNumberOfRows();
System.out.println("NUMBER OF ROWS"+no_of_rows);
HSSFRow row;
HSSFCell cell;
String s;
int rows;
rows = sheet.getPhysicalNumberOfRows();
if(rows>0)
{
int cols = 0;
int tmp = 0;
// To find number of columns in data sheet
for(int i = 0; i < rows; i++)
{
row = sheet.getRow(i);
if(row != null)
{
tmp = sheet.getRow(i).getPhysicalNumberOfCells();
if(tmp > cols)
cols = tmp;
}
}
int k=0;
// Let len be number of samples in Micro array Dataset
len=cols-1;
System.out.println("len: "+len+" row :"+rows);
// TO retrive the values row by row
for(int r = 0; r < rows; r++)
{
if(r==1)
{
System.out.println(" ");
System.out.println("-------------------------------------------");
}
row = sheet.getRow(r);
System.out.println(" ");
// To retrieve the value column by column from each Cell
if(row != null)
{
for(int c = 0; c < cols; c++)
{
if(c==1)
{
System.out.print(" | \t");
}
cell = row.getCell((short)c);
if(cell != null)
{
if(cell.getCellType() ==HSSFCell.CELL_TYPE_STRING)
{
String cellvalue=cell.getStringCellValue();
System.out.print(cellvalue+"\t");
if(r!=0 && c!=0)
{
// values are added to al arraylist
al.add(cellvalue);
}
}
else if (cell.getCellType() ==
HSSFCell.CELL_TYPE_NUMERIC)
{
double cellvalue1=cell.getNumericCellValue();
System.out.print(cellvalue1+"\t");
if(r!=0 && c!=0)
{
// values are added to al arraylist
al.add(new Double(cellvalue1));
}
}
}
else
{
System.out.println("*********Null Value:"+"row=
"+r+"Col= " +c);
}
}
k++;
}
else
{
rows++;
}
}
System.out.println("\n"+al+"\n"+al.size());
request.setAttribute("ALLVALUES",al);
request.setAttribute("col",String.valueOf(len));
}
System.out.println(" Page Forwarded ");
RequestDispatcher
rd=getServletContext().getRequestDispatcher("/sigcalcfirst.jsp");
rd.forward(request,response);
}
catch(Exception e)
{
System.out.print("Sigcalc First Servlet Exception:"+e);
}
out.close();
}
// <editor-fold defaultstate="collapsed" desc="HttpServlet methods. Click

on the + sign on the left to edit the code.">
/** Handles the HTTP <code>GET</code> method.
*/
protected void doGet(HttpServletRequest request, HttpServletResponse
response)
processRequest(request, response);
}
/** Handles the HTTP <code>POST</code> method.

*/
protected void doPost(HttpServletRequest request, HttpServletResponse
response)
processRequest(request, response);
}
/** Returns a short description of the servlet.

*/
public String getServletInfo() {
return "Short description";
}
// </editor-fold>
}
GeneRegulatoryNetworkServlet.java
import java.awt.*;
import java.awt.geom.Rectangle2D;
import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
import java.util.Arrays;
import java.util.ArrayList;
import Acme.JPM.Encoders.GifEncoder;
/**
*
* @author
* @version
*/
public class GeneRegulatoryNetworkServlet extends HttpServlet {
Frame frame = null;

Graphics g = null;
Graphics g1 = null;
public void init(ServletConfig config) throws ServletException {
super.init(config);
// Construct a reusable unshown frame
frame = new Frame();
frame.addNotify();
}
protected void processRequest(HttpServletRequest req,
HttpServletResponse res)
ServletOutputStream out = res.getOutputStream();
try {
System.out.println("GeneRegulatoryNetworkServlet");
HttpSession ses=req.getSession();
ArrayList lmg=(ArrayList)ses.getAttribute("LANDMARK1");
ArrayList lmg1=(ArrayList)ses.getAttribute("LANDMARK0");
ArrayList lg=new ArrayList();

ArrayList ag=(ArrayList)ses.getAttribute("Activator");
ArrayList rg=(ArrayList)ses.getAttribute("Repressor");
ArrayList xa=new ArrayList();
ArrayList ya=new ArrayList();
ArrayList xr=new ArrayList();
ArrayList yr=new ArrayList();
ArrayList act=new ArrayList();
ArrayList rep=new ArrayList();
System.out.println("LANDMARK1:"+lmg);
System.out.println("LANDMARK0:"+lmg1);
lg=lmg1;
int si=ag.size()+rg.size();
int gm=0,rm=0,ym = 0;
// Get the image location from the path info
String url= getServletContext().getRealPath("/");
System.out.println("URL:"+url);
String rp=java.io.File.separator+"build";
url=url.replace(rp,"");
System.out.println("URL:"+url);
String ysource = url+"images"+java.io.File.separator+"y.GIF";

String gsource = url+"images"+java.io.File.separator+"g1.GIF";
String rsource = url+"images"+java.io.File.separator+"r1.GIF";
System.out.println("ysource:"+ysource);
System.out.println("gsource:"+gsource);
System.out.println("rsource:"+rsource);
if (ysource == null) {
throw new ServletException("Extra path information " +
"must point to an image");
}
// Load the image (from bytes to an Image object)

MediaTracker mt = new MediaTracker(frame); // frame acts as
ImageObserver
Image yimage = Toolkit.getDefaultToolkit().getImage(ysource);
Image rimage = Toolkit.getDefaultToolkit().getImage(rsource);
Image gimage = Toolkit.getDefaultToolkit().getImage(gsource);
mt.addImage(yimage, 0);
mt.addImage(rimage, 1);
mt.addImage(gimage, 2);
try {
mt.waitForAll();
}
catch (InterruptedException e) {
getServletContext().log(e, "Interrupted while loading image");
throw new ServletException(e.getMessage());
}
// Construct a matching-size off screen graphics context

int w = yimage.getWidth(frame);
int h = yimage.getHeight(frame);
int w1 = gimage.getWidth(frame);
int h1 = gimage.getHeight(frame);
int w2 = rimage.getWidth(frame);
int h2 = rimage.getHeight(frame);
//frame.setBackground(Color.blue);
Image offscreen = frame.createImage(800,900);
g = offscreen.getGraphics();
g1=offscreen.getGraphics();
System.out.println("image width, height:"+w+":"+h);
System.out.println("image width, height:"+w1+":"+h1);
System.out.println("image width, height:"+w2+":"+h2);
// Draw the image to the off-screen graphics context
int count=1;
//-----------------> display the activator and repressor in color genes page
20
g.setFont(new Font("arial", Font.BOLD , 17));
g.setColor(Color.RED);
g.drawString("Gene Regulatory Network", 200, 30);
g.setColor(Color.DARK_GRAY);
g.drawString("LandMark Gene", 30, 60);
int j=0;
int count1=0;
int x1,x2,y1,y2;
int y3=0;
for(int i=0;i<lg.size();i++)
{
if(count1 ==5)
{
y3=y3+33;
j=0;
count1=0;
}
x1=(30+ (j*35));
y1=80+y3;
x2=(30+ (j*35)+(w/2) - 7);
y2=(80+h/2)+3+y3;
g.drawImage(yimage, x1, y1, frame);
g.drawString("G"+lg.get(i).toString(),x2,y2);
count1++;
j++;
ym=y2;
}
System.out.println("YMax:"+ym);
g.drawString("Activator Gene", 250, 60);
j=0;
count1=0;
y3=0;
for(int i=0;i<ag.size();i++)
{
if(count1 ==5)
{
y3=y3+33;
j=0;
count1=0;
}
x1=(250+ (j*35));
y1=80+y3;
x2=(250+ (j*35)+(w1/2) - 7);
y2=(80+h1/2)+3+y3;
g.drawImage(gimage, x1,y1, frame);
g.drawString("G"+ag.get(i).toString(),x2,y2);
count1++;
j++;
gm=y2;
}
System.out.println("GMax:"+gm);
g.drawString("Repressor Gene", 470, 60);
j=0;
count1=0;
y3=0;
for(int i=0;i<rg.size();i++)
{
if(count1 ==5)
{
y3=y3+33;
j=0;
count1=0;
}
x1=(470+ (j*35));
y1=80+y3;
x2=(470+ (j*35)+(w2/2) - 7);
y2=(80+h2/2)+3+y3;
g.drawImage(rimage, x1,y1, frame);
g.drawString("G"+rg.get(i).toString(),x2,y2);
count1++;
j++;
rm=y2;
}
System.out.println("RMax:"+rm);
// Write CONFIDENTIAL over its top
int max= ym+50;
g.setColor(Color.blue);
g.drawString("The Inference of GRN",150,max);
g.setColor(Color.DARK_GRAY);
A.2 SNAP SHOTS
INPUT SCREEN
Loading Microarray Dataset
Reading Gene values from MicroArray Dataset
REFERENCES
[1] Alter, O., Brown, P., and Botstein, D. (1997) Singular value
decomposition for genome-wide expression data processing
and modeling.
In Proceedings of Natural Academic Sciences of the United
States of
America, 10101–6.
[2] Chen, J., Wu, R., Yang, P., Huang, J., Sher, Y., Han, M., Kao,
W., Lee,
P., Chiu, T., Chang, F., Chu, Y., Wu, C., and Peck, K. (1998)
Profiling
expression patterns and isolating differentially expressed
genes by cdna
microarray system with colorimetry detection. Genomics, 51,
313–324.
[3] Eisen, M., Spellman, P., Brown, P., and Botstein, D. (1998)
Cluster
analysis and display of genome-wide expression patterns. In
Proceedings
of Natural Academic Sciences of the United States of
America, Vol. 85
14863?4868.
[4] Filkov, V. and Skiena, S. (2004) Integrating microarray
data by consensus clustering. International Journal on
Artificial Intelligence Tools,, 13(4),863–880.
[5] Goh, L. and Kasabov, N. (2003) Integrated gene

expression analysis of
multiple microarray data sets based on a normalization
technique and on
adaptive connectionist model. IEEE Proceedings, IJCNN’2003,
Vol. 3
1724–1728.
[6] Kang, J., Yang, J., Xu, W., and Chopra, P. (2005) Integrating
heterogeneous microarray data sources using correlation
signatures. In
Proceedings of Data Integration in the Life Sciences, Second
InternationalWorkshop (DILS 2005), 105–120.
[7] Kauffman, S. (1996). At Home in the Universe: The

Search for Laws of
Self-Organization and Complexity. Oxford University Press.
[8] Keedwell, E. and Narayanan, A. (2005) Discovering gene

networks
with a neural-genetic hybrid. IEEE/ACM Transactions on
Computational
Biology and Bioinformatics, 2(3), 231–242.
[9] Schena, M., Shalon, D., Heller, R., Chai, A., Brown, P., and
Davis,
R. (1996) Parallel human genome analysis: microarray-based
expression
monitoring of 1000 genes. In Proceedings of the National
Academy of
Sciences of the United States of America, 10614–10619.
[10] Thieffry, D., Huerta, A. M., Ernesto Prez-Rueda, E., and

Collado-Vides, J. (1998) From specific gene regulation to
genomic networks: a global analysis of transcriptional
regulation in escherichia coli.. BioEssays,
20(5), 433–440.
[11] Wang, Y., Joshi, T., Zhang, X.-S., Xu, D., and Chen, L.
(2006)
Inferring gene regulatory networks from multiple microarray
datasets.
Bioinformatics, 22(19), 2413–2420.
522

Inferring Gene Regulatory Networks Using Heterogeneous Microarray Dataset

Uploaded by

Copyright:

Available Formats

You might also like

Inferring Gene Regulatory Networks Using Heterogeneous Microarray Dataset

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Inferring Gene Regulatory Networks Using Heterogeneous Microarray Dataset

Uploaded by

Copyright:

Available Formats

INFERRING GENE REGULATORY NETWORKS

USING HETEROGENEOUS MICROARRAY

S.P.SUGANYA DEVI 80705104097

in partial fulfillment for the award of the degree

COMPUTER SCIENCE AND ENGINEERING

J.J.COLLEGE OF ENGINEERING AND TECHNOLOGY,

ANNA UNIVERSITY:: CHENNAI 600 025

Certified that this project report “INFERRING GENE

Dept.of Computer Science & Engg., Dept.of ComputeScience & Engg.,

J.J.College of Engg. & Tech., J.J.College of Engg. & Tech.,

Inferring Gene Regulatory Networks (GRNs) is critical in describing

CHAPTER NO. TITLE PAGE

1.1 LITERATURE SURVEY

2.1. MicroArray Dataset:

A.1 SOURCE CODE

A.2 SNAP SHOTS

FIGURE NO NAME OF THE FIGURE PAGE NO

TABLE NO NAME OF THE TABLE PAGE NO

Understanding the genetic causes behind phenotypic

The interaction between the genes can be illustrated with a

sets are often hard to interpret and unreliable due to

1.1 LITERATURE SURVEY

Singular Value Decomposition:

where J = ((Jij)n×n= ∂f(x)/∂x is an Jacobian matrix or

consideration of the sparse structure of GRNs, the inference of GRN

Evolving Connectionist System (ECOS):

This is denoted in the literature as the median partition problem

three heuristics to find this consensus cluster and demonstrated the

Formatting MicroArray Dataset:

Loading MicroArray Dataset:

Gene values from MicroArray Dataset:

Table 1.1 A Typical Microarray Example

Implementing Correlation Signature Method:

Any convenient distance metric can be used in the calculation

All correlation signature values form a vector. This vector is called

Capturing Landmark Genes:

We also list the average values of the correlation

where and represent two gene row vectors in our context.

L1(A) L2(A) Average

Modifying MicroArray Dataset:

1.3 The Modified MicroArray Data

Implementing GRN SigCalc Method :

let mi(n × m), i = 1, . . . , k represent the microarray data sets,

Avg(Sigi) : the average Signature value for gi

Capturing Activator and Repressor Genes

In Table-IV. L1(R) and L2(R) means L1 and L2 are repressors

Table 1.4 The Correlation Signature with a New Landmark gene

L1(R) L2(R) L3(A) Average(R) Average(A) Diff

GRN-SigCalc integrates multiple microarray data to construct GRNs

GRN-SigCalc can be verified by constructing GRNs from

//String FileName="C:/Documents and

ArrayList al=new ArrayList();

// <editor-fold defaultstate="collapsed" desc="HttpServlet methods. Click

/** Handles the HTTP <code>POST</code> method.

/** Returns a short description of the servlet.

Frame frame = null;

ArrayList lg=new ArrayList();