Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Njah Hurbert ICTU20233878

Piabezih McBright ICTU20234411


Leudjou Maxime ICTU20234264
Njifon Eric ICTU20234391
Otang Desmond ICTU20234300
Nyanoh Mark ICTU20233842
LEVEL 1 GROUP4

Dynamic programming

Sequence alignment

Sequence alignment is a key technique in bioinformatics. It helps to compare


sequences by optimizing alignments based on scoring schemes, which account for
matches, mismatches, and gaps.

Introduction:

Sequence alignment is essential for understanding evolutionary relationships,


predicting function, and identifying conserved motifs. Dynamic programming
offers a systematic way to perform these alignments.

1. Global Alignment: This type aligns sequences end-to-end, optimizing the


alignment over the entire length of the sequences. It is suitable when the sequences
are of similar length and are expected to be highly similar throughout. An example
of an algorithm used for global alignment is the Needleman-Wunsch algorithm.

2. Local Alignment: This type finds the most similar regions within the sequences,
which is useful for sequences of different lengths or those containing only similar
subsequences. The Smith-Waterman algorithm is commonly used for local
alignment.

Key Algorithms:

1
1. Needleman-Wunsch(Global Alignments): This algorithm is used for global
alignment, meaning it aligns the entire length of two sequences. It builds a
matrix where each cell represents the best score for aligning subsequences up
to that point, considering matches, mismatches, and gaps.
Example: Suppose we want to align the sequences "GATTACA" and
"GCATGCU"

1. Initialization:

- Create a matrix with dimensions based on the sequence lengths plus one for
initial gaps.

- Initialize the first row and column with gap penalties (assuming a gap penalty
of -1).

2. Matrix Filling: Fill in the matrix using a scoring scheme (e.g., match = +1,
mismatch = -1, gap = -1).

3. Trace back: Start from the bottom-right cell and trace back to the top-left to get
the alignment.
- G A T T A C A
- 0 - - - - - - - Example Matrix Filling:
1 2 3 4 5 6 7
G -
1
C -
2
A -
3
T -
4
G - The matrix is filled based on the
5
C - recurrence relation:
6
U 2
-
7
F(i , j)

Max = (Fi-1,j-1 + S(Ai, Bj), Fi,j-1+d, Fi-1,j+d )

Final alignment might be:

GA-TTACA

| | | | |

G-CATGCU

2. Smith-Waterman Algorithm (Local Alignment): This algorithm is for local


alignment, which identifies the most similar regions within sequences. Unlike
Needleman-Wunsch, it allows for partial sequence alignment.

Example:

Align the same sequences "GATTACA" and "GCATGCU".

1. *Initialization*:

- Initialize a matrix similar to Needleman-Wunsch, but set the first row and
column to zero.

2. Matrix Filling:

- Fill in the matrix using a scoring scheme.

- The difference here is to not allow negative scores (set negative values to zero).

3. Traceback: Start from the cell with the highest score and trace back to the first
zero encountered.

3
Example Matrix Filling:

- G A T T A C A

- 0 0 0 0 0 0 0 0

G 0

C 0

A 0

T 0

G 0

C 0

U 0

Using similar recurrence relations but ensuring no cell has a negative score.

The local alignment might look like:

TTAC

| | | |

T-AC

The resulting aligned segments are the highest scoring subsequences.

Steps in Dynamic Programming Alignment:

1. Initialization: Create a matrix with dimensions based on the lengths of the


sequences. Initialize the first row and column based on gap penalties.

4
2. Matrix Filling: Populate the matrix using recurrence relations that consider
match, mismatch, and gap scores.

3. Traceback: Follow the path from the optimal score in the matrix to reconstruct
the alignment.

Scoring Systems:

- Match: A positive score for identical characters.

- Mismatch: A negative score for non-identical characters.

- Gap Penalties: Negative scores for introducing gaps to account for insertions or
deletions.

Applications:

- Homology Detection: Identifying evolutionary relationships.

- Function Prediction: Inferring the function of unknown sequences based on


similarity to known sequences.

- Genomics: Annotating genomes by aligning them with reference sequences.

Challenges:

- Computational Complexity: Dynamic programming algorithms can be slow for


long sequences.

- Scoring Scheme Sensitivity: Results can vary greatly with different scoring
parameters.

5
Conclusion:

Dynamic programming sequence alignment is a robust tool for comparing


biological sequences, offering precise alignments that are critical for various
bioinformatics applications.

You might also like