Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

About Gene Prediction:

A DNA sequence is composed of four different nuclides (or bases) named as A(Adenine), T(Thyamine),
C(Cystosine) and G(Guanine). By mapping the alphabetic sequence of a DNA strand into a set of digital signals, the
techniques based on Digital Signal Processing (DSP) can be applied to analyze the DNA sequence.

In a genomic sequence or DNA sequence, the regions that code for proteins are called as Genes. Genes are further
split into two types of regions namely coding regions called exons and non-coding regions called introns. Exons are
responsible for protein coding and accurate identifying them will help in disease diagnosis & other medical
applications in real time.

Precise prediction of protein-coding regions in a DNA sequence is a vital problem in the field of Bio-informatics.
Base-3 periodicity problem has been observed in the coding regions, which serves as a base for all the exon
prediction techniques.

Input Sequences:

Genomic Sequences in .FASTA format downloaded from NCBI genome database

Steps:

i) At first, Voss representation which maps the alphabetic DNA sequence into the digital series.
ii) Secondly, a novel adaptive filtering scheme for genomic signal processing with the periodical behavior of biological
sequences is proposed, which can analyze and predict the biological function regions that we are interested in.
iii) Thirdly, the adaptive filtering approach is applied to identify the exons in a DNA sequence according to period-3 property of
protein coding regions. The prediction curves of the exons are obtained with the Least Mean Square (LMS)

Requirement:

i) Please refer paper s 1and 2 for information about gene prediction using LMS and other algorithms.

Request to implement coding for gene prediction using LMS algorithm and request to plot all possible plots (please refer
papers for the graphs). Consider reference signal for adaptive filter as mentioned in the papers. Please consider Comparison
Table 2 in Paper2 for predicted locations for coding using LMS.

ii) Please refer other papers for performance measures and plots(highlighted in yellow colour). All important text and
equations/expressions to calculate performance measures in the attached papers was highlighted for easy reference.

Performance Measures:

Sensitivity, Specificity, Accuracy or Precision, Approximate Correlation(AC), Global Accuracy or


Corellaton Coefficient(CC), False Positive, SNR(Psnr),Miss Rate(Mr), Wrong Rate(Wr)

Reference: Please refer highlighted expressions/equations for above performance measures in all the
papers(wherever applicable).

 Please derive values for all possible above measures using LMS
PLOTS to be considered using LMS & Reference papers:

Plot1: Power Spectral Density Plot --- Output Y(n) Vs Relative base location

Reference: Please refer Paper 1 for this plot

Plot1: Power Spectral Density Plot --- Output Y(n) Vs Relative base location

Reference: Please refer paper 2 for this plot.


Plot2: True Positive Vs False Negative

Reference: Please refer paper 3 for this plot.

Plot2: From Paper4: True Positive Vs False Negative

Reference: Please refer paper 4 for this below plot.


Plot3: SNR Vs Relative Position in a sequence

Reference: Please refer paper 3 for this below plots.


Plot4: Sensitivity Vs Specificity Vs Precision

Reference: Please refer paper 5 for this plot.

Plot5: Sensitivity Vs Specificity Vs Precision

Reference: Please refer paper 6 for this below 2 plots.


Plot6: Magnitude Vs Relative Base Locations

Reference: Please refer paper 7 for this below plot


Plot7: Eigen Value Vs Model Order

Reference: Please refer paper 7 for this below plots.


Plot8: Frequency Vs AUC Distribution

Reference: Please refer paper 8 for this below plots.


Plot8: AUC Vs Order of features

Reference: Please refer paper 8 for this below plots.

You might also like