Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

LIFS1901 2022 Spring Individual Project

Goals

On completion of this project, you are expected to be able to

 make use of Nucleotide-BLAST programs at NCBI


o to search for sequence entries in the genetic sequence database that are similar to a target
sequence
o to make pairwise genetic sequence comparison
 apply the genetic knowledge learned in this course to evaluate how genetic sequence changes may
affect the expression of a gene

Introduction

Since December 2019, the infectious disease COVID-19 has been wreaking havoc in human populations
across the world. As of 5 May 2022, over 500 million people have been infected and among them over 6
million people have died. Locally, it spread rapidly through the community since the end of January
2022, infecting more than 1.2 million and killing more than 9000 in about 2 months. This disease is
caused by a new human coronavirus named SARS-CoV-2. In the attempt to understand and control this
virus and its disease, scientists, medical personnel and epidemiologists around the world have been
isolating the viral samples from infected people and getting the viral genetic sequences from them.
Analysis of these genetic sequences has indicated that SARS-CoV-2 has apparently evolved from a bat
coronavirus and it continues to evolve by acquiring new sequence changes. The single-stranded RNA
genome of SARS-CoV-2 consists of about 30000 nucleotides. Currently, genomic sequences of about
4,000,000 SARS-CoV-2 isolates have been deposited into NCBI genetic sequence database. For
convenience in analysis, the sequences in the database are always presented in the DNA format, i.e.
with T in place of U.

The spike (S) protein of SARS-CoV-2 has been the focus of attention of biotechnologists working at the
forefront of the fight against the virus. It protrudes from the viral particle and is responsible for binding
to a surface protein of host cell (receptor protein) to initiate the infection process. With its surface
location on viral particle and its critical role in infection, S protein is the favorite target for COVID-19
vaccine development aiming at disease prevention. S protein is a large protein, making up of 1273
amino acids (thus encoded by 3822 nucleotides in the viral genome, stop codon included). Since the
obtainment of first (wild-type) S gene sequence, altered (mutant) S gene sequences have been
appearing everywhere with high infection incidences. Virus isolates with some mutant S genes are found
to be more effective in spreading than the wild-type virus. Furthermore, certain COVID-19 vaccines
targeting the wild-type S protein have been found to be less effective in preventing infection by some S
mutant viruses. As pandemics ensues, it is important to monitor evolution in S gene sequence and
update our prevention and control measures accordingly.

This project is designed to give you an opportunity to perform basic genetic sequence analysis and
database searches, and to witness the occurrence of genetic variation within a rapidly expanding viral
population. The instructor has uploaded on Canvas the wild-type and 10 mutant SARS-CoV-2 S gene
sequences. You will work on 2 of the mutant sequences assigned to you according to your student ID
number. Follow the instructions provided in Procedure section below to carry out the project work.
Procedure:

Note: A Nucleotide-BLAST search is started by going to the following website:


https://blast.ncbi.nlm.nih.gov/Blast.cgi?
PROGRAM=blastn&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome. It will be demonstrated in the
lecture on 6 May 2022. The video recording of this lecture will be uploaded on Canvas. If you need to
review how the searches are done, please view the corresponding lecture video. You can enter the
nucleotide sequence into Nucleotide-BLAST by copying it from the uploaded sequence file and then
pasting onto the entry box on Nucleotide-BLAST page.

1) [4 marks] Take the mutant S-gene sequence with the ID the same as the last digit of your HKUST
Student ID.
a) Enter this mutant S-gene sequence into the regular Nucleotide-BLAST program and run the
Betacoronavirus database search with Max Target Sequences set at 1000. Report the number of
database entries (in only 1 significant figure, e.g. 3, 40, 500, ≥1000) that contain the sequence
exactly the same as it.
b) Using “Two-Sequence” Nucleotide-BLAST, determine the base changes in this mutant sequence
with respect to the wild-type sequence. For each base change, indicate
 base change specifics, which include
o position of changed base
o which base in the wild-type sequence changed to which base in the mutant sequence
Example: Base 1501 A changed to T
 codon change specifics, which include
o position of changed codon
o which codon in the wild-type sequence changed to which codon in the mutant sequence
o which amino acid in the wild-type protein coded changed to which amino acid in the
mutant protein coded
o nature of mutation with respect to amino acid coding
Example: Codon 501 AAT changed to TAT (amino acid Asn changed to Tyr, missense)
2) [4 marks] Take the mutant S-gene sequence with the ID the same as the last-but-one digit of your
HKUST Student ID. If the last-but-one digit is the same as the last digit, take the last-but-two digit
instead, and so on. Repeat Step 1 above for this mutant S-gene sequence.
3) [2 marks] If the infection caused by one of two mutant viruses analyzed in Steps 1 and 2 above
cannot be prevented by the COVID-19 vaccines developed based on the wild-type S protein of SARS-
CoV-2, deduce which one of the two mutant viruses more likely causes unpreventable infection and
explain your deduction thoroughly. If you think that the information about the two mutant viruses
obtained above does not enable you to determine which of them more likely causes unpreventable
infection, explain thoroughly why you think so.
Submit your report online by uploading it as “Individual Project” under Assignments on Canvas. This
report is due 15 May 2022 (Sun) 11:59 pm.
Have a BLAST!

Instructor: Dr. Eugene Hung 5 May 2022

See on the next page a sample report


LIFS1901 2022 Spring Individual Project Report
Student Name: Hung, Siu Chun
Student ID: 12345nmm
1) Mutant m (delta)
a) ≥1000
b) Base 56 C changed to G: codon 19 ACA changed to AGA (amino acid Thr changed to Arg,
missense)
Base 230 A changed to C: codon 77 AAG changed to ACG (amino acid Lys changed to Thr,
missense)
Bases 467-472 AGTTTA deleted: codon 156 GAG changed to GGA (amino acid Glu changed to
Gly, missense) and codons 157 & 158 deleted (amino acids Phe & Arg deleted)
Base 1355 T changed to G: codon 452 CTG changed to CGG (amino acid Leu changed to Arg,
missense)
Base 1841 A changed to G: codon 614 GAT changed to GGT (amino acid Asp changed to Gly,
missense)
Base 2042 C changed to G: codon 681 CCT changed to CGT (amino acid Pro changed to Arg,
missense)
Base 2848 G changed to A: codon 950 GAT changed to AAT (amino acid Asp changed to Asn,
missense)
2) Mutant n (zeta)
a) 100
b) Base 1185 C changed to A: codon 395 GTC to GTA (amino acid Val unchanged, silent)
Base 1450 G changed to A: codon 484 GAA changed to AAA (amino acid Glu changed to Lys,
missense)
Base 1841 A changed to G: codon 614 GAT changed to GGT (amino acid Asp changed to Gly,
missense)
Base 3366 G changed to T: codon 1122 GTG to GTT (amino acid Val unchanged, silent)
Base 3526 G changed to T: codon 1176 GTT changed to TTT (amino acid Val changed to Phe,
missense)
3) ……

You might also like