BIOINFORMATICS

LECTURE 1
BIOINFORMATICS - Is a research, development, or application of behavioral or health

data, including those to acquire, store, organize, archive, analyze, or visualize such data.
BIOINFORMATICS is conceptualizing biology in terms of molecules (in the sense of Physical
chemistry) and applying “informatics techniques” (derived from disciplines such as applied
math, computer science and statistics) to understand and organize the information associated
with these molecules, on a large scale. In short, bioinformatics is a management information
system for molecular biology and has many practical applications.
Sub-disciplines
 The development of new algorithms and statistics

 The analysis and interpretation of various types of data
 The development and implementation of tools
INFORMATION TECHNOLOGY
 Bioinformatics may be defined as a scientific discipline encompassing acquisition,

storage, processing, analysis, interpretation and visualization of biological information.
 It encompasses frameworks, theories, algorithms, techniques and tools from mathematics,
computer science and biology with the aim of understanding the significance of a variety
of biological data.
GENOMICS
 Determine and analyze the complete DNA sequence of an organism, that is, its genome.
 The DNA encodes genes can be expressed as ribonucleic acid (RNA) transcripts and
then, in many cases, further translated into protein.
 Functional genomics describes the use of genome‐wide assays to study gene and protein
function.
 For humans and other species, it is now possible to characterize an individual’s genome,
collection of RNA (transcriptome), proteome and even the collections of metabolites and
epigenetic changes, and the catalog of organisms inhabiting the body (the microbiome)
(Topol, 2014).
 Explains how to access biological sequence data, particularly DNA and protein
 Compare two sequences (pairwise alignment)
 Compare multiple sequences (primarily by the Basic Local Alignment Search Tool
 Multiple sequence alignment
 Show how multiply aligned proteins or nucleotides can be visualized in phylogenetic
trees
The Cell
APPROACHES TO BIOINFORMATICS
Reproducible Research in Bioinformatics

LECTURE 2
DATABASE - collection of information that is organized so that it can easily be accessed,

managed, and updated. Systematic collection of data
 BIBLIOGRAPHIC – digital collection of databases

Ex. ISI web of knowledge and PubMed
 FULL-TEXT – provides compilation off documents and available for printing,
viewing, some must be purchased
 NUMERIC - expressed in numbers rather that text/images
 IMAGES – repository of images
Ex. CO’S digital flora of the Philippines and JSTOR
BIOLOGICAL DATABASES
 Stores of biological information,

 Technology of databases
 Public and Private repositories
 Large, organized with biological infos
 Focuses on nucleotide database
DATABASE ON RNA, DNA AND PROTIEN – it is used because it is used and involve in
hereditary info about the organism/closely related.
Centralized Databases Store DNA Sequences 3 major database used in Bioinformatics
1. GenBank (NCBI)
- Genetic sequence databank
- Terra sequences
- USA
2. ENA (EMBL - EBI)
- European Bioinformatics Institution
- European Nucleotide Archive
- ENGLAND
3. DDB
- DNA Databank of Japan
Central Bioinformatics Resource: NCBI and EBI
- Entrez Molecular Sequence Database system
- Ensemble data on vertebrae
- Ensemble Genomes broader, focuses on large group
LECTURE 3
Fundamentals of Genes and Genomes

Genetic information is stored in the cell in the form of biological macromolecules, such as
nucleic acids and proteins.
Why is it important to study genetic info – precursor of evolutionary change
Criteria of a Good Genetic Material which is the DNA
 Information - to construct the entire organism

 Transmission – passed from parents to offspring
 Replication – it must be copied
 Variation – it must posses
DNA as the Genetic Material
 DNA – the raw material

 Chromosome- a single molecule of DNA associated with histone and non-histone
protein
 Genes – functional unit of genetic (smallest unit)
The Structure of DNA
 Primary structure – Edges helps the forming of helix structure

 Secondary structure – Stable 3-dimensional
 Tertiary structure – Complex packing of DNA in to chromosome
The Chromosome
How is the information in a gene encoded? – genetic code consists of the sequence nitrogen
bases
Morse code is a method used in telecommunication to encode text characters as standardized
sequences of two different signal durations, called dots and dashes, or dits and dahs.
Genetic Code - the genetic code consists of the sequence of nitrogen bases—A, C, G, U—in an
mRNA chain. The four bases make up the “letters” of the genetic code
The letters are combined in groups of three to form code “words,” called codons. Each codon
stands for (encodes) one amino acid, unless its codes for a start or stop signal.
Characteristics of the Genetic Code
 The genetic code is almost universal – ex. Mitochondria, DNA

 The genetic code is unambiguous - Only one specific amino acid
 The genetic code is redundant.
Properties of the Genetic Code

1. The genetic code consists of a sequence of nucleotides in DNA or RNA.
2. The genetic code is a triplet code.
3. The genetic code is degenerate.
4. Isoaccepting tRNAs are tRNAs with different anticodons that accept the same amino
acid; wobble allows the anticodon on one type of tRNA to pair with more than one type
of codon on mRNA.
5. The code is generally nonoverlapping
6. The reading frame is set by an initiation codon, which is usually AUG.
7. When a reading frame has been set, codons are read as successive groups of three
nucleotides.
LECTURE
8. Any one of three termination codons (UAA, UAG,4 and UGA) can signal the end of a
protein; no amino acids are encoded by the termination codons.
SEQUENCE ALIGNMENT – way of arranging sequences to identify regions of similarities,
you can trace the evolutionary by sequence alignment
3.1 Eye of the tiger
 In 1994 Walter Gehring et alum (Un. Basel) turn the gene “eyeless” on in various places
on Drosophila melanogaster
 Result: on multiple places eyes are formed
 ‘eyeless’ is a master regulatory gene that controls +/- 2000 other genes
 ‘eyeless’ on induces formation of an eye
HOMEO BOX - A homeobox is a DNA sequence found within genes that are involved in the
regulation of development (morphogenesis) of animals, fungi and plants.
PAX GENE – is responsible in development of nervous system and formation of pancreas
Otx2 – formation of cascade
Sequence alignment is the most important task in bioinformatics
 Used to demonstrate conserve region or conserve sequence is highly identical/similar

 Ex. Similarities of eukaryotic cell and prokaryotic are the protein, ribosomes and
DNA.
Sequence alignment is important for:
 prediction of function
 database searching
 gene finding
 sequence divergence
 sequence assembly
3.3 On sequence similarity

 Homology: genes that derive from a common ancestor-gene are called homologs
 Orthologous genes are homologous genes in different organisms
 Paralogous genes are homologous genes in one organism that derive from gene
duplication
 Gene duplication: one gene is duplicated in multiple copies that therefore free to evolve
and assume new functions
Causes for sequence (dis)similarity
 mutation: a nucleotide at a certain location is replaced by another nucleotide (e.g.: ATA

→ AGA)
 insertion: at a certain location one new nucleotide is inserted inbetween two existing
nucleotides (e.g.: AA → AGA)
 deletion: at a certain location one existing nucleotide is deleted (e.g.: ACTG → AC-G)
 indel: an insertion or a deletion
3.4 Sequence alignment: global and local

Find the similarity between two (or more) DNA-sequences by finding a good alignment between
them.
Global alignment – longer sequence is analyzed
Local alignment – specific alignment is analyzed
Sequence alignment is an arrangement of two or more sequences, highlighting their similarity.
The sequences are padded with gaps (dashes) so that wherever possible, columns contain
identical characters from the sequences involved
Global alignment - A global alignment between two sequences is an alignment in which all the
characters in both sequences participate in the alignment.
Global alignments are useful mostly for finding closely-related sequences. As these sequences
are also easily identified by local alignment methods global alignment is now somewhat
deprecated as a technique.
Further, there are several complications to molecular evolution (such as domain shuffling) which
prevent these methods from being useful.
Ex.
Local alignment methods find related regions within sequences - they can consist of a subset of
the characters within each sequence.
For example, positions 20-40 of sequence A might be aligned with positions 50-70 of sequence
B.
This is a more flexible technique than global alignment and has the advantage that related
regions which appear in a different order in the two proteins (which is known as
domain shuffling) can be identified as being related. This is not possible with global alignment
methods.
Sequence alignment is used to study the evolution of the sequences from a common ancestor
such as protein sequences or DNA sequences.
Mismatches in the alignment correspond to mutations, and gaps correspond to insertions or
deletions.
Sequence alignment also refers to the process of constructing significant alignments in a
database of potentially unrelated sequences.

BIOINFORMATICS

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BIOINFORMATICS

Uploaded by

Copyright:

Available Formats

LECTURE 1

BIOINFORMATICS - Is a research, development, or application of behavioral or health

 The development of new algorithms and statistics

 Bioinformatics may be defined as a scientific discipline encompassing acquisition,

Reproducible Research in Bioinformatics

DATABASE - collection of information that is organized so that it can easily be accessed,

 BIBLIOGRAPHIC – digital collection of databases

 Stores of biological information,

Fundamentals of Genes and Genomes

 Information - to construct the entire organism

DNA as the Genetic Material

 DNA – the raw material

The Structure of DNA

 Primary structure – Edges helps the forming of helix structure

 The genetic code is almost universal – ex. Mitochondria, DNA

Properties of the Genetic Code

 Used to demonstrate conserve region or conserve sequence is highly identical/similar

3.3 On sequence similarity

 mutation: a nucleotide at a certain location is replaced by another nucleotide (e.g.: ATA

3.4 Sequence alignment: global and local

You might also like