ASSIGN 4_GR5_S22324

VIETNAM NATIONAL UNIVERSITY
INTERNATIONAL UNIVERSITY – SCHOOL OF BIOTECHNOLOGY
BIOINFORMATICS
Instructor: Nguyễn Minh Thành
Date of submission: 15/4/2024
ASSIGNMENT 4: GENE PREDICTION
Group 5 _ Group members:
Seq. Full name Student ID % contribution (total = 100%)
1
Nguyễn Thị Lâm Anh BTBTIU21037 25
2 Đinh Ngọc Vân Châu BTBTIU21042 25
3 Nguyễn Ngọc Bảo Hân BTBTIU21164 25
4 Trần Vĩnh Bảo Ngọc BTBTWE21113 25
Total score: /100

Question 1: In the NCBI database, retrieve the nucleotide sequences NG011676. Report:
a. Gene name, database name.

- Gene name: Homo sapiens growth hormone 1 (GH1).
- Database name: Nucleotide
b. Number of exons and its position (the start and the end of each exon).
- There are 5 exons:
+ First: from 5000 to 5072
+ Second: from 5333 to 5493
+ Third: from 5703 to 5822
+ Fourth: from 5915 to 6079
+ Fifth: from 6333 to 6636
c. The start and the end of coding region (CDS)
- Coding region starts at 5063 and ends at 6530

d. Accession number of protein from this gene and the length of polypeptide.
Accession number: NP_000506
The length of polypeptide: 217 aa
Accession date: 13/4/2024
Question 2: Use NG011676 to run GenScan. Report the results (copy & paste the outputs
generated by GenScan: Predicted genes/exons & predicted peptide sequence(s)) and compare
the results with information of this gene from Question 1 (numbers of exons & their positions,
predicted peptide length).
Gene: NG011676 NCBI database GenScan
Gene name, database name Same Same
Number of exons 5 5
Position 1st: 5000 to 5072 1st: 5113 to 5122

2nd: 5333 to 5493 2nd: 5383 to 5543
3rd: 5703 to 5822 3rd: 5753 to 5872
4th: 5915 to 6079 4th: 5965 to 6129
5th: 6333 to 6636 5th: 6383 to 6580
The length of polypeptide 217aa 217aa
=> The GenScan predicts the right number of amino acids for the length of polypeptide and the
number of exons is 5 also. However, the predicted position of exons are not as correct as the
result in NCBI.
Question 3: Use NG011676 to run FGENESH with selecting of Homo sapiens as organism
specific gene-finding parameters. Report the results (copy & paste the output generated by
FGENESH: Predicted genes/exons & predicted protein) and compare the results with
information of this gene from Question 1 (numbers of exons & their positions, predicted protein
length).
Gene: NG011676 NCBI database FGENESH
Gene name, database name Same Same
Number of exons 5 4
Position 1st: 5000 to 5072 1st: 5063 to 5072

2nd: 5333 to 5493 2nd: 5333 to 5493
3rd: 5703 to 5822 3rd: 5915 to 6079
4th: 5915 to 6079 4th: 6333 to 6530
5th: 6333 to 6636
The length of polypeptide 217aa 177aa
=> The FGENSH predicts less number of amino acid for the length of polypeptide so the results
of number and position of exons are affected.
Question 4: Run FGENESB to predict potential genes in an unknown sequence (BI_Ass 4_

sequence A), BACTERIAL generic as a closest organism. Report:
a. The length of sequence

- 94830 base pairs
b. Number of predicted genes, number of transcription units & operons
- Number of predicted genes are 112 genes
- Number of transcription units are 49 TUs
- Number of operons are 20 operons
Navigate to the BLAST homepage and select a protein BLAST (BLASTP). Enter the
polypeptide translated from gene_12 of the transcription unit 2 (operon 2) that annotated by
FGENESB into the query box, choose Non-redundant protein sequences (nr) and hit the
BLAST button. Report:
c. The start and end of CDS of gene_12, the length of predicted protein
CDS of gene_12: start: 7938th nucleotide; end: 8192th nucleotide of the sequence, 84 aa.
d. Results of the best hit from BLASTP: bit score, accession number, length of
polypeptide, genebank division name, protein name and organism name
Updated date : 18/01/2024
Acession date : 15/04/2024
● Bit Score : 169 bits

● Accession Number : WP_000862946
● Length Of Polypeptide : 84 aa
● Genebank Division Name : BCT (
● Protein Name : Hypothetical protein
● Organism Name : Bacillus anthracis
Question 5: Run ORF Finder (https://www.ncbi.nlm.nih.gov/orffinder/) to predict potential

genes for an unknown sequence BI_Ass 4_sequence B. Answer:
1. Identify the ORF which is larger than 280 bp and smaller than 300 bp and report the
following information of this ORF:
Question Answer
a. Label ORF5
b. Length of ORF 288 nt
c. Length of polypeptide translated from this frame 95 aa
d. Write the first five amino acids MGPTM
e. Write the nucleotide sequence of the coding strand 3’-TTACCTGCAGTCGAT-5’
that corresponds to the first five amino acids
Perform a protein BLAST (BLASTP) for the ORF above, choose Non-redundant protein
sequences (nr) database. Answer the following information for the best hit:
Question Answer
2. About best hit alignment:
a. Bit score 197 bits
b. Identity (ratio) 95/95
c. Similarity (ratio) 95/95
d. Gaps (ratio) 0/95
3. About the best hit sequence:
a. Accession number AAF29534.1
b. Length of polypeptide 135 aa
c. Protein name Crustacean hyperglycemic hormone
d. Organism name Macrobrachium rosenbergii
e. Common name Giant freshwater prawn
f. Protein functions Crustacean hyperglycemic hormone
controls blood sugar level; has a
secretagogue action over amylase released
from the midgut; may act as a stress
hormone and may be involved in the control
of MF secretion, molting, and reproduction
Accession date: 15/4/2024

ASSIGN 4_GR5_S22324

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ASSIGN 4_GR5_S22324

Uploaded by

Copyright:

Available Formats

VIETNAM NATIONAL UNIVERSITY

INTERNATIONAL UNIVERSITY – SCHOOL OF BIOTECHNOLOGY

Instructor: Nguyễn Minh Thành

Date of submission: 15/4/2024

ASSIGNMENT 4: GENE PREDICTION

Group 5 _ Group members:

Seq. Full name Student ID % contribution (total = 100%)

2 Đinh Ngọc Vân Châu BTBTIU21042 25

3 Nguyễn Ngọc Bảo Hân BTBTIU21164 25

4 Trần Vĩnh Bảo Ngọc BTBTWE21113 25

Total score: /100

a. Gene name, database name.

- Coding region starts at 5063 and ends at 6530

Accession number: NP_000506

The length of polypeptide: 217 aa

Accession date: 13/4/2024

Gene: NG011676 NCBI database GenScan

Gene name, database name Same Same

Position 1st: 5000 to 5072 1st: 5113 to 5122

The length of polypeptide 217aa 217aa

Gene name, database name Same Same

Position 1st: 5000 to 5072 1st: 5063 to 5072

The length of polypeptide 217aa 177aa

Question 4: Run FGENESB to predict potential genes in an unknown sequence (BI_Ass 4_

a. The length of sequence

Updated date : 18/01/2024

Acession date : 15/04/2024

● Bit Score : 169 bits

Question 5: Run ORF Finder (https://www.ncbi.nlm.nih.gov/orffinder/) to predict potential

Accession date: 15/4/2024

You might also like