Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

FUNCTIONAL GENOMICS

Name: Aftab Ahmad


Reg :BBS193001
Sec:01
Assignment:03
Assignment Title:Gene prediction
Submitted to :Maam Sehar
Date:Dec 9,2022

pg. 1
Table of Contents
Introduction…………………………………………………………………3
Problem Statement…………………………………………………………3
Objective…………………………………………………………………….3
Literature Review……………………………………………………………3
Methodology…………………………………………………………………4
Results ………………………………………………………………………..5
Validation…………………………………………………………………….8
References……………………………………………………………………9

pg. 2
➢ Introduction:
With the development of genome sequencing for many organisms, more and more raw
sequences need to be annotated. Gene prediction by computational methods for finding the
location of protein coding regions is one of the essential issues in bioinformatics. Two
classes of methods similarity based searches and ab initio prediction are used.
➢ Problem Statement:
Prediction of gene in sequence of Homo sapiens for annotation the sequence using
Bioinformatics tools in order to study and find gene components

➢ Objective:
• Gene prediction
• Locating the gene in genome
• Detecting the organism sequence for retrieving

➢ Literature Review:
This gene encodes a flavoprotein essential for nuclear disassembly in apoptotic cells, and
it is found in the mitochondrial intermembrane space in healthy cells. Induction of
apoptosis results in the translocation of this protein to the nucleus where it affects
chromosome condensation and fragmentation. In addition, this gene product induces
mitochondria to release the apoptogenic proteins cytochrome c and caspase-9. Mutations
in this gene cause combined oxidative phosphorylation deficiency 6 (COXPD6), a severe
mitochondrial encephalomyopathy, as well as Cowchock syndrome, also known as X-
linked recessive Charcot-Marie-Tooth disease-4 (CMTX-4), a disorder resulting in
neuropathy, and axonal and motor-sensory defects with deafness and cognitive disability.
Alternative splicing results in multiple transcript variants. A related pseudogene has been
identified on chromosome 10.Geneid is a program to predict genes in anonymous genomic
sequences designed with a hierarchical structure(Yan et al 2020). The accuracy
of geneid compares favorably to that of other existing tools, but geneid is likely more
efficient in terms of speed and memory usage. geneid accuracy compares to that of other
existing ab initio gene prediction tools.geneid is very efficient in terms of speed and
memory usage,geneid offers support to integrate predictions from multiple sources and to
reannotate genomic sequences, via external gff files and together with the redefinition of
the gene model.geneid output can be customized to different levels of detail, including

pg. 3
exhaustive listing of potential signals and exons. Furthermore, several output formats as gff
or XML are available (Gene id tool).
➢ Methodology:
In the first step, splice sites, start and stop codons are predicted and scored along the
sequence using Position Weight Arrays. In the second step, exons are built from the sites.
Exons are scored as the sum of the scores of the defining sites, plus the the log-likelihood
ratio of a Markov Model for coding DNA. Finally, from the set of predicted exons, the gene
structure is assembled, maximizing the sum of the scores of the assembled
exonsGeneid offers some type of support to integrate predictions from multiple source via
external gff files and the redefinition of the general gene structure or model is also feasible.

Retrieving Sequence of Camelus dromedarius from Gene id

Sequence is Downloaded in Fasta Format

Gene id Tool for prediction

Setting the parameters and uploading the Fasta file of Homo sapiens

pg. 4
Submit the File and then run it .then Results are displayed On screen.

➢ Results:
The tools give following :
• Starts(+) predicted in sequence NC_000023.11:c130165841-130129362:
[0,36479]

• Acceptors(+) predicted in sequence NC_000023.11:c130165841-130129362:


[0,36479]

• Donors(+) predicted in sequence NC_000023.11:c130165841-130129362:


[0,36479

• Stops(+) predicted in sequence NC_000023.11:c130165841-130129362:


[0,36479]

• Firsts(+) predicted in sequence NC_000023.11:c130165841-130129362:


[0,36479]

• Internals(+) predicted in sequence NC_000023.11:c130165841-130129362:


[0,36479]

• Terminals(+) predicted in sequence NC_000023.11:c130165841-130129362:


[0,36479]

• ORFs(+) predicted in sequence NC_000023.11:c130165841-130129362:


[0,36479]

Starts(+) predicted in sequence NC_000023.11:c130165841-130129362: [0,36479]


NC_000023.11:c130165841-130129362 geneid_v1.2 Start 42 44
0.09 + . # GGCTTCTCTGTCCAATGCCC
NC_000023.11:c130165841-130129362 geneid_v1.2 Start 77 79
0.09 + . # GAGGAGTCTGCGTAATGTGC
NC_000023.11:c130165841-130129362 geneid_v1.2 Start 186 188
4.64 + . # TAGCGGTCGCCGAAATGTTC
NC_000023.11:c130165841-130129362 geneid_v1.2 Start 467 469
3.73 + . # GAATTGGCTCAGCCATGCCG
NC_000023.11:c130165841-130129362 geneid_v1.2 Start 611 613
-1.62 + . # CCCTTTTTAGAGCAATGAGG
NC_000023.11:c130165841-130129362 geneid_v1.2 Start 693 695
-2.12 + . # AGTTTCTTGTGTAAATGAGG
NC_000023.11:c130165841-130129362 geneid_v1.2 Start 729 731
-

pg. 5
Acceptors(+) predicted in sequence NC_000023.11:c130165841-130129362:
[0,36479]
NC_000023.11:c130165841-130129362 geneid_v1.2 Acceptor 1
2 -0.35 + . # ***********************GTTC
NC_000023.11:c130165841-130129362 geneid_v1.2 Acceptor 21
22 2.22 + . # ***GTTCCCCTTCCCCGGCTCTAGCAG
NC_000023.11:c130165841-130129362 geneid_v1.2 Acceptor 24
25 1.01 + . # GTTCCCCTTCCCCGGCTCTAGCAGGCC
NC_000023.11:c130165841-130129362 geneid_v1.2 Acceptor 55
56 -1.98 + . # TCTCTGTCCAATGCCCACCCGGAGCTG
NC_000023.11:c130165841-130129362 geneid_v1.2 Acceptor 90
91 -2.57 + . # AGTCTGCGTAATGTGCGTGTGAAGAGA
NC_000023.11:c130165841-130129362 geneid_v1.2 Acceptor 92
93 -5.08 + . # TCTGCGTAATGTGCGTGTGAAGAGACT
NC_000023.11:c130165841-130129362 geneid_v1.2 Acceptor 145
146 -5.04 + . # TTTGACCCGTCGGTCGTGCGTGAGAGG

# Donors(+) predicted in sequence NC_000023.11:c130165841-130129362:


[0,36479]
NC_000023.11:c130165841-130129362 geneid_v1.2 Donor 67 68
-4.50 + . # GGAGTCTGC
NC_000023.11:c130165841-130129362 geneid_v1.2 Donor 73 74
-1.94 + . # TGCGTAATG
NC_000023.11:c130165841-130129362 geneid_v1.2 Donor 78 79
0.25 + . # AATGTGCGT
NC_000023.11:c130165841-130129362 geneid_v1.2 Donor 82 83
-2.51 + . # TGCGTGTGA
NC_000023.11:c130165841-130129362 geneid_v1.2 Donor 84 85
-2.11 + . # CGTGTGAAG
NC_000023.11:c130165841-130129362 geneid_v1.2 Donor 118 119
-1.26 + . # ACGGTGTTT

# Stops(+) predicted in sequence NC_000023.11:c130165841-130129362:


[0,36479]
NC_000023.11:c130165841-130129362 geneid_v1.2 Stop 19 21
0.00 + . # CTAG
NC_000023.11:c130165841-130129362 geneid_v1.2 Stop 75 77
0.00 + . # GTAA
NC_000023.11:c130165841-130129362 geneid_v1.2 Stop 86 88
0.00 + . # GTGA
NC_000023.11:c130165841-130129362 geneid_v1.2 Stop 124 126
0.00 + . # TTGA
NC_000023.11:c130165841-130129362 geneid_v1.2 Stop 142 144
0.00 + . # GTGA

# Firsts(+) predicted in sequence NC_000023.11:c130165841-130129362:


[0,36479]
NC_000023.11:c130165841-130129362 geneid_v1.2 First 42 67
-7.51 + 0
NC_000023.11:c130165841-130129362 geneid_v1.2 First 42 73
-5.96 + 0
NC_000023.11:c130165841-130129362 geneid_v1.2 First 77 82
-6.50 + 0

pg. 6
# Internals(+) predicted in sequence NC_000023.11:c130165841-130129362:
[0,36479]
NC_000023.11:c130165841-130129362 geneid_v1.2 Internal 2
67 -8.40 + 0
NC_000023.11:c130165841-130129362 geneid_v1.2 Internal 2
67 -8.91 + 1
NC_000023.11:c130165841-130129362 geneid_v1.2 Internal 2
73 -6.94 + 0
NC_000023.11:c130165841-130129362 geneid_v1.2 Internal 2
73 -7.36 + 1
NC_000023.11:c130165841-130129362 geneid_v1.2 Internal 2
78 -5.57 + 0
Terminals(+) predicted in sequence NC_000023.11:c130165841-130129362:
[0,36479]
NC_000023.11:c130165841-130129362 geneid_v1.2 Terminal 2
21 -5.99 + 2
NC_000023.11:c130165841-130129362 geneid_v1.2 Terminal 2
77 -6.18 + 1
88 -4.20 + 1

# Singles(+) predicted in sequence NC_000023.11:c130165841-130129362:


[0,36479]
NC_000023.11:c130165841-130129362 geneid_v1.2 Single 186 299
1.11 + 0
NC_000023.11:c130165841-130129362 geneid_v1.2 Single 611 679
-5.70 + 0
NC_000023.11:c130165841-130129362 geneid_v1.2 Single 693 758
-5.69 + 0
NC_000023.11:c130165841-130129362 geneid_v1.2 Single 840 923
-7.76 + 0

NC_000023.11:c130165841-130129362 geneid_v1.2 First 36439 36449


-8.79 + 0
36467 -8.05 - 1
NC_000023.11:c130165841-130129362 geneid_v1.2 Internal 36447
36470 -6.28 - 0

NC_000023.11:c130165841-130129362 geneid_v1.2 Terminal 36451


36458 -5.52 + 2
NC_000023.11:c130165841-130129362 geneid_v1.2 End 36480 36480
0.00 + 0
-

# ORFs(+) predicted in sequence NC_000023.11:c130165841-130129362: [0,36479]


NC_000023.11:c130165841-130129362 geneid_v1.2 ORF 19 126
-8.04 + 0
NC_000023.11:c130165841-130129362 geneid_v1.2 ORF 75 299
-9.19 + 0
NC_000023.11:c130165841-130129362 geneid_v1.2 ORF 220 342
-6.53 + 0
NC_000023.11:c130165841-130129362 geneid_v1.2 ORF 293 400
-9.37 + 0

# Optimal Gene Structure. 1 genes. Score = 33.97


# Gene 1 (Forward). 14 exons. 552 aa. Score = 33.97

pg. 7
NC_000023.11:c130165841-130129362 geneid_v1.2 First 186 291
2.78 + 0 NC_000023.11:c130165841-130129362_1
NC_000023.11:c130165841-130129362 geneid_v1.2 Internal 9239
9381 -1.12 + 2 NC_000023.11:c130165841-130129362_1
NC_000023.11:c130165841-130129362 geneid_v1.2 Internal 16274
16373 3.34 + 0 NC_000023.11:c130165841-130129362_1
NC_000023.11:c130165841-130129362 geneid_v1.2 Internal 17966
18090 3.06 + 2 NC_000023.11:c130165841-130129362_1
NC_000023.11:c130165841-130129362 geneid_v1.2 Internal 18219
18349 0.88 + 0 NC_000023.11:c130165841-130129362_1
NC_000023.11:c130165841-130129362 geneid_v1.2 Internal 20273
20363 0.74 + 1 NC_000023.11:c130165841-130129362_1
NC_000023.11:c130165841-130129362 geneid_v1.2 Internal 25225
25309 -0.05 + 0 NC_000023.11:c13016584

➢ Validation:
GENEID a program which is used to predict genes, exons, splice sites, gene scores, internal
,external codon, start, stop, codon ,Open reading Frame and other signals along a sequence.

pg. 8
➢ References:
• Yan, C., Gong, L., Chen, L., Xu, M., Abou-Hamdan, H., Tang, M., Désaubry, L., &
Song, Z. (2020). PHB2 (prohibitin 2) promotes PINK1-PRKN/Parkin-dependent
mitophagy by the PARL-PGAM5-PINK1 axis. Autophagy, 16(3), 419–434.
https://doi.org/10.1080/15548627.2019.1628520
• https://genome.crg.es/geneid.html

pg. 9

You might also like