11 - Chapter 1 Introduction

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Chapter I- Introduction

With the intensive growth of the amount of publicly available genomic data, a new

field of Computer Science i.e., Bioinformatics has emerged, focusing on the use of

computing systems for efficiently deriving, storing, and analyzing the character strings

of genome to help to solve problems in molecular biology. The term Bioinformatics was

coined by Paulien Hogeweg in 1979 for the study of informatics processes in biotic

systems. The National Center for Biotechnology Information (NCBI, 2001) defines as

Bioinformatics is the field of science in which Biology, Computer Science and

Information Technology merges into a single discipline. In this field there are three

important sub discipline within: the development of new algorithms and statistics with

which to assess relationships among members of large data sets, the analysis and

interpretation of various types of data including nucleotides, amino acid sequence,

protein domains, protein structures,the development and implementation of tools that

enable efficient access and management of different types of information.

Bioinformatics is the use of IT in the field of Biotechnology for the various

macromolecular data storage, data warehousing and analyzing different complex

macromolecular data for the study of central dogma.

The flood of data from biology, mainly in the form of DNA, RNA and Protein

sequences, puts heavy demand on computers and computational scientists. At the same

time, it demands a transformation of basic ethos of biological sciences. Hence, data

mining techniques can be used efficiently to explore hidden pattern underlying in

biological data in various human diseases Un-supervised classification, also known as

clustering; which is one of the branches of Data Mining can be applied to biological data

and this can result in a better era of rapid medical development and drug discovery.

1
Diabetes Mellitus is a endocrine metabolic disorder, it is unfortunate that India is

considered diabetes capital of the globe. Necessitating researchers to probe and peep into

genomic analysis of the disease with the aid of Bioinformatics.

In the past decade, the advent of efficient genome sequencing tools has led to

enormous progress in life sciences. Among the most important innovations, microarray

technology allows to quantify the expression for thousand of genes simultaneously. The

characteristic of these data which makes it different from machine-learning/pattern

recognition data includes, a fair amount of random noise, missing values, a dimension in

the range of thousands, and a sample size in few dozens. A particular application of the

micro array technology is undertaken in our study of diabetes research, where the goal is

to search for gene expression at transcriptome level in Healthy, Prediabetic and Diabetic

subjects, and there effects on Gene Ontology (Molecular Function, Cellular Component

and Biological Process) and Metabolic Pathways. The challenge for a biologist and

computer scientist is to provide solution based on terms of automation, quality and

efficiency in understanding genomic information in abnormal glycemic state between

Healthy individual and Diabetes.

2
Prediabetes and Diabetes:-

The global prevalence of pre diabetes has been increasing progressively in past few

decades, As per IDF (International Diabetes Federation) Diabetes Atlas, the number of

cases IGT (Impaired Glucose Tolerance) To worldwide is estimated to be approximately

340 million (2). By 2030 the global prevalence of IGT is estimated to reach 8.4% which

will be approximately 462 million people. It has been established that Prediabetic status

is a strong risk factor for overt Diabetes with or without Micro and Macro vascular

complications (1,2).

Type 2 Diabetes is a complex disease that is caused by a complex interplay

between genetic, epigenetic and environmental factor Diabetes refers to a group of

metabolic diseases characterized by hyperglycemia resulting from defects in insulin

secretion, insulin action, or both (3). The chronic hyperglycemia of diabetes is

associated with long term damage, dysfunction, and failure of different organs, like Eye,

Kidneys, Nerve, Heart and Blood Vessels. Diabetes is currently fastest growing

epidemic and has been ascribed to a collision between genes and the environment.

Normal glucose tolerance is maintained by following mechanism-

a) Insulin secretion occurs from pancreas in response to carbohydrate load.

b) Suppression of endogenous hypatic glucose production of insulin.

c) Insulin mediated glucose uptake by peripheral insulin responsive tissue

like muscle and fat.

In Type 2 Diabetes glucose uptake by peripheral tissue is decreased due to

development of resistance to insulin. These resistance occurs 5 to 10 years or even more

before development of Diabetes Mellitus. These period is considered Prediabetic where

3
pancreas working overtime when patient develop full blown Diabetes the pancreas is

still working overtime and get exhausted. This is considered so called Clinical Diabetes

Mellitus. The pancreas tries to overcome this resistance initially by producing more

insulin leading to hyperinsulinaemic state. When the overproduction of insulin by

pancreas fail to overcome the insulin resistance, clinical Diabetes, develops late insulin

secretion starts declining either due to glucose toxicity and B-cell exhaustion. Increase in

hepatic glucose production also seen in Type 2 Diabetes (T2D) but not in Impaired

Glucose Tolerance (IGT). These increased hepatic glucose production is reversible with

lifestyle modification and oral agents.

T2D have a strong genetic component although identification of susceptibility

genes have been elusive and no single gene defect explained T2D, hence the disease is

Heterogenic, Multigenic and has complexitiology (4).

Genome Wide Associations Studies (GWAS) :-

Genome-wide association studies (GWAS) have facilitated a substantial and rapid rise in

the number of confirmed genetic susceptibility variants for Type 2 Diabetes (T2D).

Approximately 40 variants have been identified so far, many of which were discovered

through GWAS in available literature (11). This success has led to widespread hope that

the findings will translate into improved clinical care for the increasing numbers of

patients with Diabetes. Potential areas or clinical translation include risk prediction and

subsequent disease prevention, pharmacogenetics, and the development of novel

therapeutics. However, the genetic loci so far identified account for only a small fraction

(approximately 10%) of the overall heritable risk for T2D. Uncovering the missing

4
heritability is essential to the progress of T2D genetic studies and to the translation of

genetic information into clinical practice [7].

Next Generation Sequencing (NGS) and Genome Wide Association Studies

(GWAS):-

NGS (next generation sequencing) and GWAS (Genome Wide Association Studies)

have led to technical development of genetic variants risked and protection of Type 2

Diabetes Mellitus. Next Generation Sequencing (NGS) which shows the amount of

gene which has been expressed and there arrangement of nucleotide bases in the gene

fragments which codes for protein, also some genes, or a copy of a gene which has lost

the ability to produce a functional protein, may be due to mutation or inaccurate

duplication in the sequence which can be termed as pseudogene [11]. Pseudogene

expression can be found due to Single Nucleotide Polymorphism (SNP’s), are DNA

variant that occur when a single base pair (A,T,G and C) in the sequence altered.

Transcriptome Analysis:-

Transcriptome analysis is an important tool for characterization and understanding of the

molecular basis of phenotypic variation in biology, including diseases. During the past

decades micro-arrays have been the most important and widely used approach for such

analyses, but recently high-throughput sequencing of cDNA (RNA-seq) has emerged as

a powerful alternative present study is based on this new transcriptome analysis (5). And

it has already found numerous applications (4). RNA-seq uses next-generation

sequencing (NGS) methods to sequence cDNA that has been derived from an RNA

sample, and hence produces millions of short reads. These reads are then typically

5
mapped to a reference genome and the number of reads mapping within a genomic

feature of interest (such as a gene or an exon) is used as a measure of the abundance of

the feature in the analyzed sample. Arguably the most common use of transcriptome

profiling is in the search for Differentially Expressed Genes (DEG), that is, genes that

show differences in expression level between conditions or in other ways are associated

with given predictors or responses. RNA-seq offers several advantages over microarrays

for differential expression analysis, such as an increased dynamic range and a lower

background level, and the ability to detect and quantify the expression of previously

unknown transcripts and isoforms [7,8,9].

The Global prevalence of Prediabetes and Diabetes have been increasing

progressively from past few decades and India being a major hub centre for Diabetes

Mellitus globally of this world.

Present study is therefore undertaken Prediabetic and Diabetic for recognizing

genes which have been expressed and their arrangements of nucleotide base in the gene

fragments which code for protein and also to recognize the genes which has lost their

ability to produce functional proteins and genes. We have humbly tried to analyze with

purity and concentration of Nanodrop Spectrophotometer of RNA, central dogma study,

Gene Ontology and Metabolic Pathway analysis for Differential Expressed Genes in

Healthy, Prediabetic and Diabetic Subjects. To best of my knowledge and search of

literature finger countable studies on RNA-Seq (Transcriptome) on Diabetes and

Prediabetic samples are available in the Indian and Western literature.

You might also like