Professional Documents
Culture Documents
Inferring Gene Regulatory Networks Using Heterogeneous Microarray Dataset
Inferring Gene Regulatory Networks Using Heterogeneous Microarray Dataset
Inferring Gene Regulatory Networks Using Heterogeneous Microarray Dataset
A PROJECT REPORT
Submitted by
of
BACHELOR OF ENGINEERING
in
MAY 2005
ANNA UNIVERSITY : CHENNAI 600 025
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
<<Name>> <<Name>>
HEAD OF THE DEPARTMENT SUPERVISOR
Assistant professor,
Ammapattai, Ammapattai,
Tiruchirappalli-620009. Tiruchirappalli-620009.
ACKNOWLEDGEMENT
ABSTRACT
ABSTRACT
LIST OF TABLE
LIST OF FIGURES
LIST OF SYMBOLS
1. INTRODUCTION
2. PROPOSED SYSTEM
4. FUTURE WORK
APPENDICES
REFERENCES
LIST OF FIGURES
CHAPTER 1
INTRODUCTION
Clustering:
Filkov and Skiena proposed a method to combine
microarray data from various experiments on an equal
basis using the concept of consensus clustering [4].
Clustering technology has been very popular and widely
applied in analyzing biological data [3] for years. There
are many existing clustering results for the same
organism available in public repositories today.
clusterings of the same genes can be used to extract
more information about the groups that the genes co-
belong to than the individual clusterings themselves.
Consensus clustering isan algorithm that is based on
the various, source-specific (for the same organism)
clusterings of the data (or the meta-data) to both provide
an integrated view of the data and eliminate misclassifications due to
errors in the individual data sets. Mathematically, clusterings are set
partitions and therefore consensus clustering algorithm can be
formalized as a set partition problem. Given n partitions, p1, p2, . . . ,
pn, and the symmetric difference distance on any two partitions, find a
consensus partition p that minimizes
D =n
_ i=1d(pi, p).
PROPOSED SYSTEM:
8
correlation metrics may be used to achieve more accurate results than
does Euclidean distance.
10
Table 1.2 The Correlation Signature
the average value near 0.5 ±θ. The modified gene expression vectors
are shown in Table-1.3
1. do
2. for each gi in G
3. for each lj in L
4. calculate Sig(gi, lj)
13
5. end for
6. calculate Avg(Sigi)
7. if Avg(Sigi)±0.5 < θ
8. remove gi from G
9. end if
10. end for
11. for each gi in G
12. calculate AvgAct(Sig(gi))
13. calculateAvgRep(Sig(gi))
14. ifDiff(AvgAct(Sig(gi)),AvgRep(Sig(gi))) < δ
15. remove gi from G
16. end if
17. end for
18. add gx withMAX(Diff(Avg(ACT),Avg(REP))) to L
19. if activators activate gx
20. add gx to ACT
20. else add gx to REP
21. end if
22. add gx to the GRN with the incoming lines from ACT
23. and outgoing lines to elements in REP set
24. until elements in M have been all processed
14
15
16
Constructing Gene Regulatory Network
We continue this procedure until all the genes have either been
added to the landmark gene set or excluded from the working
microarray data set. At the end of the loop, we obtain a gene
activating/repressing relation graph. This is illustrated in Figure-1
where solid lines and dashed lines represent activate and repress
relationship, respectively. Note that we can use δ as a threshold for
Diff to limit the final GRN size. That is, we can reduce the GRN size
by removing the genes that have a Diff value under the threshold
requirement in each loop.
1
7
GRN-SigCalc can be verified by constructing GRNs from training
data sets and test the activating/repressing relationship against the testing
data sets. When multiple data sets are available, we can split these sets to
several groups and construct GRNs separately
CHAPTER 3
CONCLUSION
This project is used to construct the GRNs using microarray data sets
from heterogeneous data sources. Since the microarray datasources are
obtained from different organisms and have different sample backgrounds,
quality control standard and microarray technologies etc., the integration of
the data is a difficult task. On the other hand, due to the curse of
dimensionality, GRNs constructed from a single microarray experiment are
often not convincing and lack robust statistical basis. Although the data from
different sources may vary greatly, presumably, the relationship between the
genes will remain.
21
APPENDICES
A.1SOURCE CODE
SigcalcFirstServlet.java
import java.io.*;
import java.lang.reflect.Array;
import java.net.*;
import java.util.ArrayList;
import javax.servlet.*;
import javax.servlet.http.*;
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.*;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
/**
*
* @author
* @version
*/
public class SigcalcFirstServlet extends HttpServlet {
static int len=0;
static int no_of_row=0;
/** Processes requests for both HTTP <code>GET</code> and
<code>POST</code> methods.
* @param request servlet request
* @param response servlet response
*/
protected void processRequest(HttpServletRequest request,
HttpServletResponse response)
throws ServletException, IOException {
response.setContentType("text/html;charset=UTF-8");
PrintWriter out = response.getWriter();
int no_of_rows=sheet.getPhysicalNumberOfRows();
System.out.println("NUMBER OF ROWS"+no_of_rows);
HSSFRow row;
HSSFCell cell;
String s;
int rows;
rows = sheet.getPhysicalNumberOfRows();
if(rows>0)
{
int cols = 0;
int tmp = 0;
// To find number of columns in data sheet
for(int i = 0; i < rows; i++)
{
row = sheet.getRow(i);
if(row != null)
{
tmp = sheet.getRow(i).getPhysicalNumberOfCells();
if(tmp > cols)
cols = tmp;
}
}
int k=0;
// Let len be number of samples in Micro array Dataset
len=cols-1;
System.out.println("len: "+len+" row :"+rows);
// TO retrive the values row by row
for(int r = 0; r < rows; r++)
{
if(r==1)
{
System.out.println(" ");
System.out.println("-------------------------------------------");
}
row = sheet.getRow(r);
System.out.println(" ");
// To retrieve the value column by column from each Cell
if(row != null)
{
for(int c = 0; c < cols; c++)
{
if(c==1)
{
System.out.print(" | \t");
}
cell = row.getCell((short)c);
if(cell != null)
{
if(cell.getCellType() ==HSSFCell.CELL_TYPE_STRING)
{
String cellvalue=cell.getStringCellValue();
System.out.print(cellvalue+"\t");
if(r!=0 && c!=0)
{
// values are added to al arraylist
al.add(cellvalue);
}
}
else if (cell.getCellType() ==
HSSFCell.CELL_TYPE_NUMERIC)
{
double cellvalue1=cell.getNumericCellValue();
System.out.print(cellvalue1+"\t");
if(r!=0 && c!=0)
{
// values are added to al arraylist
al.add(new Double(cellvalue1));
}
}
}
else
{
System.out.println("*********Null Value:"+"row=
"+r+"Col= " +c);
}
}
k++;
}
else
{
rows++;
}
}
System.out.println("\n"+al+"\n"+al.size());
request.setAttribute("ALLVALUES",al);
request.setAttribute("col",String.valueOf(len));
}
System.out.println(" Page Forwarded ");
RequestDispatcher
rd=getServletContext().getRequestDispatcher("/sigcalcfirst.jsp");
rd.forward(request,response);
}
catch(Exception e)
{
System.out.print("Sigcalc First Servlet Exception:"+e);
}
out.close();
}
GeneRegulatoryNetworkServlet.java
import java.awt.*;
import java.awt.geom.Rectangle2D;
import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
import java.util.Arrays;
import java.util.ArrayList;
import Acme.JPM.Encoders.GifEncoder;
/**
*
* @author
* @version
*/
public class GeneRegulatoryNetworkServlet extends HttpServlet {
try {
System.out.println("GeneRegulatoryNetworkServlet");
HttpSession ses=req.getSession();
ArrayList lmg=(ArrayList)ses.getAttribute("LANDMARK1");
ArrayList lmg1=(ArrayList)ses.getAttribute("LANDMARK0");
INPUT SCREEN
Loading Microarray Dataset
Reading Gene values from MicroArray Dataset
REFERENCES
[1] Alter, O., Brown, P., and Botstein, D. (1997) Singular value
decomposition for genome-wide expression data processing
and modeling.
In Proceedings of Natural Academic Sciences of the United
States of
America, 10101–6.
[2] Chen, J., Wu, R., Yang, P., Huang, J., Sher, Y., Han, M., Kao,
W., Lee,
P., Chiu, T., Chang, F., Chu, Y., Wu, C., and Peck, K. (1998)
Profiling
expression patterns and isolating differentially expressed
genes by cdna
microarray system with colorimetry detection. Genomics, 51,
313–324.
[3] Eisen, M., Spellman, P., Brown, P., and Botstein, D. (1998)
Cluster
analysis and display of genome-wide expression patterns. In
Proceedings
of Natural Academic Sciences of the United States of
America, Vol. 85
14863?4868.
[4] Filkov, V. and Skiena, S. (2004) Integrating microarray
data by consensus clustering. International Journal on
Artificial Intelligence Tools,, 13(4),863–880.
[6] Kang, J., Yang, J., Xu, W., and Chopra, P. (2005) Integrating
heterogeneous microarray data sources using correlation
signatures. In
Proceedings of Data Integration in the Life Sciences, Second
InternationalWorkshop (DILS 2005), 105–120.
[11] Wang, Y., Joshi, T., Zhang, X.-S., Xu, D., and Chen, L.
(2006)
Inferring gene regulatory networks from multiple microarray
datasets.
Bioinformatics, 22(19), 2413–2420.
522