Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

VIETNAM NATIONAL UNIVERSITY

INTERNATIONAL UNIVERSITY – SCHOOL OF BIOTECHNOLOGY

BIOINFORMATICS

Instructor: Nguyễn Minh Thành

Date of submission: 15/4/2024

ASSIGNMENT 4: GENE PREDICTION

Group 5 _ Group members:

Seq. Full name Student ID % contribution (total = 100%)

1
Nguyễn Thị Lâm Anh BTBTIU21037 25

2 Đinh Ngọc Vân Châu BTBTIU21042 25

3 Nguyễn Ngọc Bảo Hân BTBTIU21164 25

4 Trần Vĩnh Bảo Ngọc BTBTWE21113 25

Total score: /100


Question 1: In the NCBI database, retrieve the nucleotide sequences NG011676. Report:

a. Gene name, database name.


- Gene name: Homo sapiens growth hormone 1 (GH1).
- Database name: Nucleotide
b. Number of exons and its position (the start and the end of each exon).
- There are 5 exons:
+ First: from 5000 to 5072
+ Second: from 5333 to 5493
+ Third: from 5703 to 5822
+ Fourth: from 5915 to 6079
+ Fifth: from 6333 to 6636
c. The start and the end of coding region (CDS)

- Coding region starts at 5063 and ends at 6530


d. Accession number of protein from this gene and the length of polypeptide.

Accession number: NP_000506

The length of polypeptide: 217 aa

Accession date: 13/4/2024

Question 2: Use NG011676 to run GenScan. Report the results (copy & paste the outputs
generated by GenScan: Predicted genes/exons & predicted peptide sequence(s)) and compare
the results with information of this gene from Question 1 (numbers of exons & their positions,
predicted peptide length).

Gene: NG011676 NCBI database GenScan

Gene name, database name Same Same

Number of exons 5 5

Position 1st: 5000 to 5072 1st: 5113 to 5122


2nd: 5333 to 5493 2nd: 5383 to 5543
3rd: 5703 to 5822 3rd: 5753 to 5872
4th: 5915 to 6079 4th: 5965 to 6129
5th: 6333 to 6636 5th: 6383 to 6580

The length of polypeptide 217aa 217aa

=> The GenScan predicts the right number of amino acids for the length of polypeptide and the
number of exons is 5 also. However, the predicted position of exons are not as correct as the
result in NCBI.

Question 3: Use NG011676 to run FGENESH with selecting of Homo sapiens as organism
specific gene-finding parameters. Report the results (copy & paste the output generated by
FGENESH: Predicted genes/exons & predicted protein) and compare the results with
information of this gene from Question 1 (numbers of exons & their positions, predicted protein
length).
Gene: NG011676 NCBI database FGENESH

Gene name, database name Same Same

Number of exons 5 4

Position 1st: 5000 to 5072 1st: 5063 to 5072


2nd: 5333 to 5493 2nd: 5333 to 5493
3rd: 5703 to 5822 3rd: 5915 to 6079
4th: 5915 to 6079 4th: 6333 to 6530
5th: 6333 to 6636

The length of polypeptide 217aa 177aa

=> The FGENSH predicts less number of amino acid for the length of polypeptide so the results
of number and position of exons are affected.

Question 4: Run FGENESB to predict potential genes in an unknown sequence (BI_Ass 4_


sequence A), BACTERIAL generic as a closest organism. Report:

a. The length of sequence


- 94830 base pairs
b. Number of predicted genes, number of transcription units & operons
- Number of predicted genes are 112 genes
- Number of transcription units are 49 TUs
- Number of operons are 20 operons

Navigate to the BLAST homepage and select a protein BLAST (BLASTP). Enter the
polypeptide translated from gene_12 of the transcription unit 2 (operon 2) that annotated by
FGENESB into the query box, choose Non-redundant protein sequences (nr) and hit the
BLAST button. Report:

c. The start and end of CDS of gene_12, the length of predicted protein

CDS of gene_12: start: 7938th nucleotide; end: 8192th nucleotide of the sequence, 84 aa.
d. Results of the best hit from BLASTP: bit score, accession number, length of
polypeptide, genebank division name, protein name and organism name

Updated date : 18/01/2024

Acession date : 15/04/2024

● Bit Score : 169 bits


● Accession Number : WP_000862946
● Length Of Polypeptide : 84 aa
● Genebank Division Name : BCT (
● Protein Name : Hypothetical protein
● Organism Name : Bacillus anthracis

Question 5: Run ORF Finder (https://www.ncbi.nlm.nih.gov/orffinder/) to predict potential


genes for an unknown sequence BI_Ass 4_sequence B. Answer:
1. Identify the ORF which is larger than 280 bp and smaller than 300 bp and report the
following information of this ORF:

Question Answer
a. Label ORF5
b. Length of ORF 288 nt
c. Length of polypeptide translated from this frame 95 aa
d. Write the first five amino acids MGPTM
e. Write the nucleotide sequence of the coding strand 3’-TTACCTGCAGTCGAT-5’
that corresponds to the first five amino acids

Perform a protein BLAST (BLASTP) for the ORF above, choose Non-redundant protein
sequences (nr) database. Answer the following information for the best hit:
Question Answer
2. About best hit alignment:
a. Bit score 197 bits
b. Identity (ratio) 95/95
c. Similarity (ratio) 95/95
d. Gaps (ratio) 0/95
3. About the best hit sequence:
a. Accession number AAF29534.1
b. Length of polypeptide 135 aa
c. Protein name Crustacean hyperglycemic hormone
d. Organism name Macrobrachium rosenbergii
e. Common name Giant freshwater prawn
f. Protein functions Crustacean hyperglycemic hormone
controls blood sugar level; has a
secretagogue action over amylase released
from the midgut; may act as a stress
hormone and may be involved in the control
of MF secretion, molting, and reproduction

Accession date: 15/4/2024

You might also like