BIOINFORMATICS LAB Report

INTRODUICTION
I analyses the few genes of human genome by the using of Ensembl.org bio tools and find the
few variants in these sequence. The position of genes such as:
7 117530985 117530985 G/A
7 117531038 117531038 T/C
7 117531068 117531068 T/C
3 25609262 25609262 G/A
3 25624786 25624786 G/A
3 25634002 25634002 G/A/T
 Those number are highlight for yellow color show the chromosomal number in human
genome sequence.
 Those number are highlight for blue color show the length of nucleotide sequence of
gene in human genome sequence.
 And the highlighting of gray color show the variation or mutation.
Analysis result
I. Percentage of synonymous and nonsynonymous
mutation
After the variation in these genes the percentage of synonymous mutation
(salient variation) is 27% and the missense variation is73%.
II. Existing and Novel variants
Five variants are existing out of six variants and only one variant are novel.
The existing variants {7 117530985 117530985 G/A, 7 117531038 117531038 T/C,
7 117531068 117531068 T/C, 3 25624786 25624786 G/A, 3 2563400225634002 G/A/T}
and the novel variant is {3 25609262 25609262 G/A}.
III. Identification of genes and their product.
 This variant {3 25609262 25609262 G/A} make a protein DNA TOPOISOMERASE
2-BETA and the gene name is TOP2B.
 This variant {3 25624786 25624786 G/A} make a protein DNA TOPOISOMERASE
2-BETA and the gene name is TOP2B.
 This variant {3 25634002 25634002 G/A/T} make a protein DNA
TOPOISOMERASE 2-BETA and the gene name is TOP2B.
 This variant {7 117530985 117530985 G/A} make a protein Cystic fibrosis
transmembrane conductance regulator and the gene name is CFTR.
 This variant {7 117531038 117531038 T/C} make a protein Cystic fibrosis
 This variant {7 117531068 117531068 T/C} make a protein Cystic fibrosis
IV. Predicted amino acid.
 After the change of single nucleotide G with A. In this variants {3 25609262
25609262 G/A}. The Aspartic acid (D) (amino acid) is formed.
 Before the mutation in this variant {3 25624786 25624786 G/A} the Arginine (R)
amino acid formed and after mutation single nucleotide change G with A the
tryptophan (w) amino acid is formed.
 Before the mutation in this variant {3 25634002 25634002 G/A/T} the Arginine
(R) amino acid formed and after mutation single nucleotide change G with T. The
cysteine (C) amino acid is formed. And G with A the serine (S) amino acid is
formed.
 After the change of single nucleotide G with A. In this variants {7 117530985
117530985 G/A}. The Alanine (A) (amino acid) is formed.
 Before the mutation in this variant {7 117531038 117531038 T/C} leucine (L)
amino acid formed and after mutation single nucleotide change T with C the
proline (P) amino acid is formed.
 Before the mutation in this variant {7 117531068 117531036 T/C} Isoleucine (I)
amino acid formed and after mutation single nucleotide change T with C the
threonine (T) amino acid is formed.
V. Identify Benign (non-pathogenic) and deleterious

(pathogenic) on the basis of shift and polyphen scores.
VARIANTS SHIFT POLYPHEN NON PATHOGENIC PERCENTAGE FOR
SCORE SCORE PATHOIGENIC POSSIBILITY
3 25609262 ____ ____ ____ _____ ________
25609262 G/A
3 25624786 0 1/0.999 _____ Pathogenic 100% probably for
25624786 G/A highly pathogenic
3 25634002 0.01 0.923 ______ pathogenic Probably
25634002 damaging
G/A/T
3 25634002 0.19 0.154 Non _______ 100% non-
25634002 pathogenic pathogen
G/A/T
7 117530985 ___ _____ _____ _____ ______
117530985 G/A
7 117531038 0 0.991 _____ pathogenic 100% pathogenic
117531038 T/C
7 117531038 0.08 0.991 Not sure Not sure 50% chances for
117531038 T/C pathogenic and
non-pathogenic
7 117531068 0.34 0.375 Non- ______ 100% non-
117531068 T/C pathogenic pathogenic
7 117531068 0.22 0.969 Not sure Not sure 50% chances for
117531068 T/C pathogenic and
non-pathogenic
INTRODUICTION
I analyze this sequence by the using of mutationtaster.org bio tools. In this sequence I analyze
after the mutation this product of this sequence is pathogenic or not. Sequence is given below.
TGTCTTGGCAAGTT [C/A] TTTAGCTACACCA.
i. Variant is pathogenic or not.

After the variation in this sequence. Sequence of amino acid is change. Protein features
affected and site of splicing is change. That’s why this variant is pathogenic.
ii. Identify the change Amino acid after the

variation.
Before the mutation in this sequence (TGTCTTGGCAAGTT C TTTAGCTACACCA). The TTC
codon code the phenylalanine amino acid. And after mutation (TGTCTTGGCAAGTT A
TTTAGCTACACCA). The TTA codon code the leucine amino acid.
TOOLS.
MHY9 gene analyze by the using of ensemble.org bio tools.
ANALYSIS RESULT.
i. Identify this gene present is which chromosomes
and forward and reverse strand.
This gene present is in chromosomes number 22 and the strand is reverse.
ii. Identify which specie closely related and which

specie most distantly related.
The specie of chimpanzees: 2homolog is closely related for human genome. And
elephant shark is distantly related for human genome.
iii. Reported transcript in this gene.

There are eleven (11) transcript reported in this gene.
iv. How many transcript are protein coding?

There are three (3) transcript are protein coding in this gene.
v. Identify longest protein coding transcript and

amino acid.
This Transcript:(MYH9-201 ENST00000216181.10) are longest protein
coding transcript. And there are 1960 amino acid.
vi. Some phenotypes associated with this gene.
 Alveolar rhabdomyosarcoma.
 Anaplastic astrocytoma
 Bile duct carcinoma
 Bladder carcinoma
 Breast carcinoma
 Colon carcinoma
 Cutaneous melanoma
 Dedifferentiated chondrosarcoma
 Diffuse large B-cell lymphoma
 Epstein syndrome
 Esophageal carcinoma
 Fechtner syndrome. Etc.
PROTEIN COIDING TRANSCRIPT

i. Total exon in protein transcript.
Transcript: (ENST00000216181.10). There are 41 exon in this transcript.
ii. Amino acid sequence of this exon.

a. 41 exon
MAQQAADKYLYVDKNFINNPLAQADWAAKKLVWVPSDKSGFEPASLKEEVGEEAIVELVE
NGKKVKVNKDDIQKMNPPKFSKVEDMAELTCLNEASVLHNLKERYYSGLIYTYSGLFCVV
INPYKNLPIYSEEIVEMYKGKKRHEMPPHIYAITDTAYRSMMQDREDQSILCTGESGAGK
TENTKKVIQYLAYVASSHKSKKDQGELERQLLQANPILEAFGNAKTVKNDNSSRFGKFIR
INFDVNGYIVGANIETYLLEKSRAIRQAKEERTFHIFYYLLSGAGEHLKTDLLLEPYNKY
RFLSNGHVTIPGQQDKDMFQETMEAMRIMGIPEEEQMGLLRVISGVLQLGNIVFKKERNT
DQASMPDNTAAQKVSHLLGINVTDFTRGILTPRIKVGRDYVQKAQTKEQADFAIEALAKA
TYERMFRWLVLRINKALDKTKRQGASFIGILDIAGFEIFDLNSFEQLCINYTNEKLQQLF
NHTMFILEQEEYQREGIEWNFIDFGLDLQPCIDLIEKPAGPPGILALLDEECWFPKATDK
SFVEKVMQEQGTHPKFQKPKQLKDKADFCIIHYAGKVDYKADEWLMKNMDPLNDNIATLL
HQSSDKFVSELWKDVDRIIGLDQVAGMSETALPGAFKTRKGMFRTVGQLYKEQLAKLMAT
LRNTNPNFVRCIIPNHEKKAGKLDPHLVLDQLRCNGVLEGIRICRQGFPNRVVFQEFRQR
YEILTPNSIPKGFMDGKQACVLMIKALELDSNLYRIGQSKVFFRAGVLAHLEEERDLKIT
DVIIGFQACCRGYLARKAFAKRQQQLTAMKVLQRNCAAYLKLRNWQWWRLFTKVKPLLQV
SRQEEEMMAKEEELVKVREKQLAAENRLTEMETLQSQLMAEKLQLQEQLQAETELCAEAE
ELRARLTAKKQELEEICHDLEARVEEEEERCQHLQAEKKKMQQNIQELEEQLEEEESARQ
KLQLEKVTTEAKLKKLEEEQIILEDQNCKLAKEKKLLEDRIAEFTTNLTEEEEKSKSLAK
LKNKHEAMITDLEERLRREEKQRQELEKTRRKLEGDSTDLSDQIAELQAQIAELKMQLAK
KEEELQAALARVEEEAAQKNMALKKIRELESQISELQEDLESERASRNKAEKQKRDLGEE
LEALKTELEDTLDSTAAQQELRSKREQEVNILKKTLEEEAKTHEAQIQEMRQKHSQAVEE
LAEQLEQTKRVKANLEKAKQTLENERGELANEVKVLLQGKGDSEHKRKKVEAQLQELQVK
FNEGERVRTELADKVTKLQVELDNVTGLLSQSDSKSSKLTKDFSALESQLQDTQELLQEE
NRQKLSLSTKLKQVEDEKNSFREQLEEEEEAKHNLEKQIATLHAQVADMKKKMEDSVGCL
ETAEEVKRKLQKDLEGLSQRHEEKVAAYDKLEKTKTRLQQELDDLLVDLDHQRQSACNLE
KKQKKFDQLLAEEKTISAKYAEERDRAEAEAREKETKALSLARALEEAMEQKAELERLNK
QFRTEMEDLMSSKDDVGKSVHELEKSKRALEQQVEEMKTQLEELEDELQATEDAKLRLEV
NLQAMKAQFERDLQGRDEQSEEKKKQLVRQVREMEAELEDERKQRSMAVAARKKLEMDLK
DLEAHIDSANKNRDEAIKQLRKLQAQMKDCMRELDDTRASREEILAQAKENEKKLKSMEA
EMIQLQEELAAAERAKRQAQQERDELADEIANSSGKGALALEEKRRLEARIAQLEEELEE
EQGNTELINDRLKKANLQIDQINTDLNLERSHAQKNENARQQLERQNKELKVKLQEMEGT
VKSKYKASITALEAKIAQLEEQLDNETKERQAACKQVRRTEKKLKDVLLQVDDERRNAEQ
YKDQADKASTRLKQLKRQLEEAEEEAQRANASRRKLQRELEDATETADAMNREVSSLKNK
LRRGDLPFVVPRRMARKGAGDGSDEEVDGKADGAEAKPAE
b.6 exon
MAQQAADKYLYVDKNFINNPLAQADWAAKKLVWVPSDKSGFEPASLKEEVGEEAIVELVENGKKVKVNKDDIQKM
NPPKFSKVEDMAELTCLNEASVLHNLKERYYSGLIYTYSGLFCVVINPYKNLPIYSEEIVEMYKGKKRHEMPPHI
YAITDTAYRSMMQDREDQSILCTGESGAGKTENTKKVIQYLAYVASSHKSKKDQEAYFKNRAVRLGVL
c. 3 exon
MAQQAADKYLYVDKNFINNPLAQADWAAKKLVWVPSDKSGFEPASLKEEVGEEAIVELVENGKKVKVNKDDIQKM
NPPKFSKVEDMAELTCLNEASVLHNLKE
i. Which gene associated with phenylketonuria?

The associated gene with phenylketonuria is PAF (phenylalanine hydrolase).
ii. How many protein coding transcript in this gene?

There are seven (7) protein coding transcript in this gene.
iii. Identify this gene present is which chromosomes

and forward and reverse strand.
This gene present is in chromosomes number 12 and the strand is reverse.
iv. How many orthologues and paralogues in this
gene?
There are 198 orthologues and 3 paralogues reported in this gene.
v. What phenotypes associated with this gene?

 Classic phenylketonuria.
 Maternal phenylketonuria.
 Mild hyperphenylalaninemia.
 Mild phenylketonuria.
 Non-phenylketonuria hyperphenylalaninemia.
 PHENYLKETONURIA.
 Tetrahydrobiopterin-responsive hyperphenylalaninemia/phenylketon.
vi. Are there any protein coding variants report in

this gene?
Yes
vii. How many cited variants in 100 genome dataset?

There are 25027 cited variants are found in 100 genome data set.
Tools
This SNP (rs11725853) is analyses by a using of ensembl.org (tools).
i. Which gene this SNP has been reported?

GLRA3 gene reported in this SNP.
ii. Which phenotype associated?
Urinary albumin excretion.
iii. How many alternate alleles in this SNP?

There are two (2) alternate alleles in this SNP.
iv. Which allele is the risk allele for identified

phenotype?
These allele (G>C/G>A) is risk allele.
v. Which population has the highest frequency of

the risk allele?
The population of south Asian has the highest frequency of the risk allele.
G=0.51 and after variation in allele C=0.22 and A=0.27.
vi. Which paper describe the rate between

rs11725853 and urinary albumin excretion?
GAWAS catalog. The NHGRI-EBI catalog of published genome-wide association
studies.
INTRODUICTION
In this pedigree I am find the which type of diabetic occur in individual number one after the
mutations. The sequence of all individuals are given below.
>1
cagccgcagcctttgtgaaccaacacctgtgcggctcacacctggtggaagctctctacc
tagtgtgcggggaacgaggcttcttctacacacccaagacctgccgggaggcagaggacc
>2
tagtgtgcggggaacgaggcttcttctacacacccaagacccgccgggaggcagaggacc
>3
>4
>5
>6
>7
tagtgtgcggagaacgaggcttcttctacacacccaagacccgccgggaggcagaggacc
>8
I am analyze these sequence by the using of http://www.uniprot.org/align/
ANALYSIS RESULT.
i. Do any of these individuals possess any
mutation?
Yes
ii. If yes then which individuals do?

Mutation occur in individual number 1 and 3. Salient mutation occur in individual
number 7.
iii. What is the mutated allele?
>1
1 cagccgcagcctttgtgaaccaacacctgtgcggctcacacctggtggaagctctctacc 60
61 tagtgtgcggggaacgaggcttcttctacacacccaagacctgccgggaggcagaggacc 120
In Individual number 1 the mutated allele is t and position of mutated allele is

102
>3
1 cagccgcagcctttgtgaaccaacacctgtgcggctcacacctggtggaagctctctacc 60
61 tagtgtgcggggaacgaggcttcttctacacacccaagacctgccgggaggcagaggacc 120.
In Individual number 3 the mutated allele is t and position of mutated allele is

102
TRANSLATIOIN TOOL USING ONE MUTATED SEQUENCE

AND ONE NORMAL SEQUENCE TRANSLATED IN TO
PROTEIN. TOOL (BLASTX).
i. What amino acid change occurs due to the
changing of c>t.
>2
Before the mutation CGC code the amino acid is arginine “R”.
>1
After the mutation C change with T the codon is TGC. TGC code the amino acid
is cysteine “C”.
ii. What type of mutation would it be? (i.e.
synonymous, non-synonymous, neutral non-
synonymous, non-sense).
Non synonymous (miss sense) mutation would be it.

BIOINFORMATICS LAB Report

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BIOINFORMATICS LAB Report

Uploaded by

Copyright:

Available Formats

INTRODUICTION

7 117530985 117530985 G/A

7 117531038 117531038 T/C

7 117531068 117531068 T/C

3 25609262 25609262 G/A

3 25624786 25624786 G/A

3 25634002 25634002 G/A/T

V. Identify Benign (non-pathogenic) and deleterious

i. Variant is pathogenic or not.

ii. Identify the change Amino acid after the

ii. Identify which specie closely related and which

iii. Reported transcript in this gene.

iv. How many transcript are protein coding?

v. Identify longest protein coding transcript and

PROTEIN COIDING TRANSCRIPT

Transcript: (ENST00000401701.1). There are 6 exon in this transcript.

Transcript: (ENST00000456729.1). There are 3 exon in this transcript.

ii. Amino acid sequence of this exon.

i. Which gene associated with phenylketonuria?

ii. How many protein coding transcript in this gene?

iii. Identify this gene present is which chromosomes

v. What phenotypes associated with this gene?

vi. Are there any protein coding variants report in

vii. How many cited variants in 100 genome dataset?

i. Which gene this SNP has been reported?

iii. How many alternate alleles in this SNP?

iv. Which allele is the risk allele for identified

v. Which population has the highest frequency of

G=0.51 and after variation in allele C=0.22 and A=0.27.

vi. Which paper describe the rate between

I am analyze these sequence by the using of http://www.uniprot.org/align/

ii. If yes then which individuals do?

In Individual number 1 the mutated allele is t and position of mutated allele is

In Individual number 3 the mutated allele is t and position of mutated allele is

TRANSLATIOIN TOOL USING ONE MUTATED SEQUENCE

Non synonymous (miss sense) mutation would be it.

You might also like