Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

BLAST exercise

A. Different flavors of BLAST


A research group identified a gene from patients with disturbed sleeping patterns:

Nucleotide sequence:

gggtgaacag ccgcacggga gtaggtacgc acctgacctc gctggcactg ccgggcaagg cagagggtgt ggcgtcgctc accagccagt


gcagctacag cagcaccatc gtccatgtgg gagacaagaa gccgcagccg gagttagaga tggtggaaga tgctgcgagt gggccagaat

1. Perform a Blastn search in NCBI BLAST. Use the "Nucleotide Collection". What is the most likely hit?
Identify the single nucleotide polymorphism(s) (SNPs) that this patient carries.

The most likely hit is Homo sapiens period circadian regulator 2 (PER2), mRNA with accession number of
NM_022817.3. There are 12387 single nucleotide polymorphism(s) SNPs that the patient carried

Figure 1.1 Most likely hit result from NCBI blast

2. Do these mutations cause a difference on the protein sequence that the patient expresses? Can you
find this out using a different BLAST?

These mutations can caused a difference in the protein sequence that the patient expresses and
it can be found using BLASTn (figure 1.2) and tBLASTx (figure1.3). It can also be found by using the
Nucleotide NCBI Database and Gene Database in NCBI to detect the SNPs that the patient carries. In the
SNP page, there will be a list of all the mRNA sequence that can be mutated and can express different
protein sequence.
Figure 1.2 The mutation of SNPs of Homo sapiens that can be seen through BLASTn

Figure 1.3 The mutations SNPs of Homo sapiens that can be seen through tBLASTx

3. What is the difference between BLASTN and MEGABLAST results? To see this focus on distant
homologs found by both searches (you can get the taxonomy/organism/lineage report by clicking on
"Taxonomy Report" link on the top of the results page). Are you able to find a homolog in, for
example, Xenopus in both cases? If not, why?

MEGABLAST results are those that are highly similar sequences to the sequence being blasted.
While BLASTN results are those that are only somewhat identical sequences, homologs can be found in
both cases for Drosophila because BLASTN gives high sensitivity results, whereas MegaBlast gives low
sensitivity results as it can show highly unspecific results and find improper alignments. The homolog
Xenopus was not found in both cases due to the different sensitivities of BlastN and MegaBlast.

B. Function Prediction
The genome of Astyanax mexicanus is not yet fully annotated. Many proteins are predicted but the
annotation of their function is far from being complete.

1. Predict the function of the following sequence based on their homologs (using NCBI BLAST).
>Astyanax_mexicanus_protein

RDSSMVKEEIKAFLANRRISQAVVAQVTGISQSRISHWLLQQGSDLSEQKKRAFYRWYQLEKTPGATLNMRPAPLA
LEEIEWRQTPPPISTAPGSFRLRRGSRFTWRKECLAVMESYFNDNQYPDEAKREEIANACNAVIQKPGKKLSDLERVT
SLKVYNWFANRRKEIKRRANIEATILESHGIDVQSPGGHSNSDDIDASDYTE

This is the sequence for the protein HMBOX1. It regulates intracellular free zinc level. A
homeobox-containing protein known as HMBOX1 can directly bind telomeric double-stranded DNA and
associate with PML nuclear bodies.

Homeobox genes have an essential role in specifying cell identity and positioning during
embryonic development: specification, patterning, and differentiation of ectodermal appendages.
Homeobox genes consist of a large group of similar genes that direct the body structure formation during
early embryonic development.

Homeobox gene group in human contains approximately 235 functional genes and 65
pseudogenes. Homeobox genes also contain a particular DNA sequence that provides instruction to make
a string of amino acid which is called homeodomain. Usually, proteins that contain homeodomain serve
as transcription factors that bind to and control the activity of other genes.

2. In which species did you find the best hits? Is this expected?

The best hit can was Astyanax mexicanus because the percent identity is 100% and the E-value
is higher. Therefore, the result was expected as the protein sequence was from Astyanax mexicanus.

Figure 2.1 The result of the Astyanax mexicanus protein homolog using Blastp

C. Ubiquitin

Ubiquitin is a regulatory protein that is ubiquitously expressed in eukaryotes. Ubiquitination (or


ubiquitylation) refers to the post-translational modification of a protein by the covalent attachment
(via an isopeptide bond) of one or more ubiquitin monomers. The most prominent function of ubiquitin
is labeling proteins for proteasomal degradation. Besides this function, ubiquitination also controls the
stability, function, and intracellular localization of a wide variety of proteins.

1. After having read about the function of the ubiquitin, do you think it is a protein found only in
vertebrates?

No, I think the ubiquitin can be found in all eukaryotic cells and whose sequence is exceptionally
well conserved from protozoan to vertebrates because it expressed in all eukaryotes which means include
both vertebrates and invertebrates
2. Use BLAST to find out how conserved ubiquitin is. As a start point use this human ubiquitin
sequence:

>human ubiquitin
MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG

Ubiquitin is a highly conserved protein expressed in eukaryotes that plays major role in the
degradation of 26S proteosom. Ubiquitination refers to covalent attachment (via an isopeptide bond)
post-translational modification one or more ubiquitin monomers. The most important function of
ubiquitin is proteasomal degradation labeling proteins and take control of a wide variety of proteins such
as intracellular localization, function, stability.

According to the “Conserved Domains” at the “Graphical Summary” on the NCBI Delta-BLAST
search result, the yellow bar showed that the bar was mostly aligned with the sequence and only small
part of them are cut short. They mostly covered from the Methionine amino acid (M), which is the first
amino acid from the left to the Arginine (R) amino acid (4th amino acid from the right). The bar is shown
to be the sequence of ubiquitin superfamily, which was compared to the human ubiquitin sequence.

Figure 3.1. Delta-BLAST result of Ubiquitin

3. Using BLAST find out whether or not ubiquitin is expressed only by eukaryotes.

The ubiquitin can be found in all eukaryotic cells and whose sequence is exceptionally well
conserved from protozoan to vertebrates. It can be seen through BLAST. The lineage that was showed on
the “Summary” part of the Ubiquitin-producing gene were only eukaryotic organisms that have ubiquitin.

Figure 3.2. The Summary of Ubiquitin-producing gene

You might also like