Annotation

FUNCTIONAL GENOMICS
Name: Aftab Ahmad

Reg :BBS193001
Sec:01
Assignment:03
Assignment Title:Gene prediction
Submitted to :Maam Sehar
Date:Dec 9,2022
pg. 1
Table of Contents
Introduction…………………………………………………………………3
Problem Statement…………………………………………………………3
Objective…………………………………………………………………….3
Literature Review……………………………………………………………3
Methodology…………………………………………………………………4
Results ………………………………………………………………………..5
Validation…………………………………………………………………….8
References……………………………………………………………………9
pg. 2
➢ Introduction:
With the development of genome sequencing for many organisms, more and more raw
sequences need to be annotated. Gene prediction by computational methods for finding the
location of protein coding regions is one of the essential issues in bioinformatics. Two
classes of methods similarity based searches and ab initio prediction are used.
➢ Problem Statement:
Prediction of gene in sequence of Homo sapiens for annotation the sequence using
Bioinformatics tools in order to study and find gene components
➢ Objective:
• Gene prediction
• Locating the gene in genome
• Detecting the organism sequence for retrieving
➢ Literature Review:
This gene encodes a flavoprotein essential for nuclear disassembly in apoptotic cells, and
it is found in the mitochondrial intermembrane space in healthy cells. Induction of
apoptosis results in the translocation of this protein to the nucleus where it affects
chromosome condensation and fragmentation. In addition, this gene product induces
mitochondria to release the apoptogenic proteins cytochrome c and caspase-9. Mutations
in this gene cause combined oxidative phosphorylation deficiency 6 (COXPD6), a severe
mitochondrial encephalomyopathy, as well as Cowchock syndrome, also known as X-
linked recessive Charcot-Marie-Tooth disease-4 (CMTX-4), a disorder resulting in
neuropathy, and axonal and motor-sensory defects with deafness and cognitive disability.
Alternative splicing results in multiple transcript variants. A related pseudogene has been
identified on chromosome 10.Geneid is a program to predict genes in anonymous genomic
sequences designed with a hierarchical structure(Yan et al 2020). The accuracy
of geneid compares favorably to that of other existing tools, but geneid is likely more
efficient in terms of speed and memory usage. geneid accuracy compares to that of other
existing ab initio gene prediction tools.geneid is very efficient in terms of speed and
memory usage,geneid offers support to integrate predictions from multiple sources and to
reannotate genomic sequences, via external gff files and together with the redefinition of
the gene model.geneid output can be customized to different levels of detail, including
pg. 3
exhaustive listing of potential signals and exons. Furthermore, several output formats as gff
or XML are available (Gene id tool).
➢ Methodology:
In the first step, splice sites, start and stop codons are predicted and scored along the
sequence using Position Weight Arrays. In the second step, exons are built from the sites.
Exons are scored as the sum of the scores of the defining sites, plus the the log-likelihood
ratio of a Markov Model for coding DNA. Finally, from the set of predicted exons, the gene
structure is assembled, maximizing the sum of the scores of the assembled
exonsGeneid offers some type of support to integrate predictions from multiple source via
external gff files and the redefinition of the general gene structure or model is also feasible.
Retrieving Sequence of Camelus dromedarius from Gene id
Sequence is Downloaded in Fasta Format
Gene id Tool for prediction
Setting the parameters and uploading the Fasta file of Homo sapiens
pg. 4
Submit the File and then run it .then Results are displayed On screen.
➢ Results:
The tools give following :
• Starts(+) predicted in sequence NC_000023.11:c130165841-130129362:
[0,36479]
• Acceptors(+) predicted in sequence NC_000023.11:c130165841-130129362:

[0,36479]
• Donors(+) predicted in sequence NC_000023.11:c130165841-130129362:

[0,36479
• Stops(+) predicted in sequence NC_000023.11:c130165841-130129362:

[0,36479]
• Firsts(+) predicted in sequence NC_000023.11:c130165841-130129362:

[0,36479]
• Internals(+) predicted in sequence NC_000023.11:c130165841-130129362:

[0,36479]
• Terminals(+) predicted in sequence NC_000023.11:c130165841-130129362:

[0,36479]
• ORFs(+) predicted in sequence NC_000023.11:c130165841-130129362:

[0,36479]
Starts(+) predicted in sequence NC_000023.11:c130165841-130129362: [0,36479]

NC_000023.11:c130165841-130129362 geneid_v1.2 Start 42 44
0.09 + . # GGCTTCTCTGTCCAATGCCC
0.09 + . # GAGGAGTCTGCGTAATGTGC
4.64 + . # TAGCGGTCGCCGAAATGTTC
3.73 + . # GAATTGGCTCAGCCATGCCG
-1.62 + . # CCCTTTTTAGAGCAATGAGG
-2.12 + . # AGTTTCTTGTGTAAATGAGG
-
pg. 5
Acceptors(+) predicted in sequence NC_000023.11:c130165841-130129362:
[0,36479]
NC_000023.11:c130165841-130129362 geneid_v1.2 Acceptor 1
2 -0.35 + . # ***********************GTTC
22 2.22 + . # ***GTTCCCCTTCCCCGGCTCTAGCAG
25 1.01 + . # GTTCCCCTTCCCCGGCTCTAGCAGGCC
56 -1.98 + . # TCTCTGTCCAATGCCCACCCGGAGCTG
91 -2.57 + . # AGTCTGCGTAATGTGCGTGTGAAGAGA
93 -5.08 + . # TCTGCGTAATGTGCGTGTGAAGAGACT
146 -5.04 + . # TTTGACCCGTCGGTCGTGCGTGAGAGG
# Donors(+) predicted in sequence NC_000023.11:c130165841-130129362:

[0,36479]
NC_000023.11:c130165841-130129362 geneid_v1.2 Donor 67 68
-4.50 + . # GGAGTCTGC
-1.94 + . # TGCGTAATG
0.25 + . # AATGTGCGT
-2.51 + . # TGCGTGTGA
-2.11 + . # CGTGTGAAG
-1.26 + . # ACGGTGTTT
# Stops(+) predicted in sequence NC_000023.11:c130165841-130129362:

[0,36479]
NC_000023.11:c130165841-130129362 geneid_v1.2 Stop 19 21
0.00 + . # CTAG
NC_000023.11:c130165841-130129362 geneid_v1.2 Stop 75 77
0.00 + . # GTAA
NC_000023.11:c130165841-130129362 geneid_v1.2 Stop 86 88
0.00 + . # GTGA
NC_000023.11:c130165841-130129362 geneid_v1.2 Stop 124 126
0.00 + . # TTGA
NC_000023.11:c130165841-130129362 geneid_v1.2 Stop 142 144
0.00 + . # GTGA
# Firsts(+) predicted in sequence NC_000023.11:c130165841-130129362:

[0,36479]
NC_000023.11:c130165841-130129362 geneid_v1.2 First 42 67
-7.51 + 0
-5.96 + 0
-6.50 + 0
pg. 6
# Internals(+) predicted in sequence NC_000023.11:c130165841-130129362:
[0,36479]
NC_000023.11:c130165841-130129362 geneid_v1.2 Internal 2
67 -8.40 + 0
67 -8.91 + 1
73 -6.94 + 0
73 -7.36 + 1
78 -5.57 + 0
Terminals(+) predicted in sequence NC_000023.11:c130165841-130129362:
[0,36479]
NC_000023.11:c130165841-130129362 geneid_v1.2 Terminal 2
21 -5.99 + 2
77 -6.18 + 1
88 -4.20 + 1
# Singles(+) predicted in sequence NC_000023.11:c130165841-130129362:

[0,36479]
NC_000023.11:c130165841-130129362 geneid_v1.2 Single 186 299
1.11 + 0
-5.70 + 0
-5.69 + 0
-7.76 + 0

-8.79 + 0
36467 -8.05 - 1
36470 -6.28 - 0

36458 -5.52 + 2
NC_000023.11:c130165841-130129362 geneid_v1.2 End 36480 36480
0.00 + 0
-
# ORFs(+) predicted in sequence NC_000023.11:c130165841-130129362: [0,36479]

NC_000023.11:c130165841-130129362 geneid_v1.2 ORF 19 126
-8.04 + 0
NC_000023.11:c130165841-130129362 geneid_v1.2 ORF 75 299
-9.19 + 0
NC_000023.11:c130165841-130129362 geneid_v1.2 ORF 220 342
-6.53 + 0
NC_000023.11:c130165841-130129362 geneid_v1.2 ORF 293 400
-9.37 + 0
# Optimal Gene Structure. 1 genes. Score = 33.97

# Gene 1 (Forward). 14 exons. 552 aa. Score = 33.97
pg. 7
2.78 + 0 NC_000023.11:c130165841-130129362_1
9381 -1.12 + 2 NC_000023.11:c130165841-130129362_1
16373 3.34 + 0 NC_000023.11:c130165841-130129362_1
18090 3.06 + 2 NC_000023.11:c130165841-130129362_1
18349 0.88 + 0 NC_000023.11:c130165841-130129362_1
20363 0.74 + 1 NC_000023.11:c130165841-130129362_1
25309 -0.05 + 0 NC_000023.11:c13016584
➢ Validation:
GENEID a program which is used to predict genes, exons, splice sites, gene scores, internal
,external codon, start, stop, codon ,Open reading Frame and other signals along a sequence.
pg. 8
➢ References:
• Yan, C., Gong, L., Chen, L., Xu, M., Abou-Hamdan, H., Tang, M., Désaubry, L., &
Song, Z. (2020). PHB2 (prohibitin 2) promotes PINK1-PRKN/Parkin-dependent
mitophagy by the PARL-PGAM5-PINK1 axis. Autophagy, 16(3), 419–434.
https://doi.org/10.1080/15548627.2019.1628520
• https://genome.crg.es/geneid.html
pg. 9

Annotation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Annotation

Uploaded by

Copyright:

Available Formats

FUNCTIONAL GENOMICS

Name: Aftab Ahmad

Retrieving Sequence of Camelus dromedarius from Gene id

Sequence is Downloaded in Fasta Format

Gene id Tool for prediction

• Acceptors(+) predicted in sequence NC_000023.11:c130165841-130129362:

• Donors(+) predicted in sequence NC_000023.11:c130165841-130129362:

• Stops(+) predicted in sequence NC_000023.11:c130165841-130129362:

• Firsts(+) predicted in sequence NC_000023.11:c130165841-130129362:

• Internals(+) predicted in sequence NC_000023.11:c130165841-130129362:

• Terminals(+) predicted in sequence NC_000023.11:c130165841-130129362:

• ORFs(+) predicted in sequence NC_000023.11:c130165841-130129362:

Starts(+) predicted in sequence NC_000023.11:c130165841-130129362: [0,36479]

# Donors(+) predicted in sequence NC_000023.11:c130165841-130129362:

# Stops(+) predicted in sequence NC_000023.11:c130165841-130129362:

# Firsts(+) predicted in sequence NC_000023.11:c130165841-130129362:

# Singles(+) predicted in sequence NC_000023.11:c130165841-130129362:

NC_000023.11:c130165841-130129362 geneid_v1.2 First 36439 36449

NC_000023.11:c130165841-130129362 geneid_v1.2 Terminal 36451

# ORFs(+) predicted in sequence NC_000023.11:c130165841-130129362: [0,36479]

# Optimal Gene Structure. 1 genes. Score = 33.97

You might also like