next up previous
Next: Global alignment Up: Dotplots Previous: Dotplots

Exercise: Making a dotplot

unix % dottup
DNA sequence dot plot
Input sequence: tembl:xl23808
Second sequence: tembl:xlrhodop
Word size [4]: 10
Graph type [x11]:

A window will pop up on your screen that should look something like this:


 \begin{figure}
\begin{center}
\epsfig{figure=dottup.ps,width=8in, height=5in}\end{center}\end{figure}

The diagonal lines represent areas where the two sequences align well. You can see that there are five clear diagonals. You will remember that we are aligning genomic and cDNA - these five diagonals represent the five exons of the gene! If you look at the original tembl entry for the genomic sequence using SRS, you will see that the annotated entry says that there are five exons in this gene. So our results are in agreement.

But the dotplot doesn't give us any detailed sequence information. For this, we need to use different programs. The algorithms we will be using are more rigorous than those used for searching databases; so even if you have retrieved a sequence from a database using something like BLAST, it will be well worth your while performing a careful pairwise alignment afterwards. The basic idea behind the sequence alignment programs is to align the two sequences in such a way as to produce the highest score - a scoring matrix is used to add points to the score for each match and subtract them for each mismatch. The matrices used for nucleic acid alignments tend to involve fairly simple match/mismatch scoring schemes, while the matrices commonly used for scoring protein alignments are more complex, with scores designed to reflect similarity between the different amino acids rather than simply scoring identities. Over time various mutations occur in sequences; the scoring matrices attempt to cope with mutations, but insertions and deletions require some extra parameters to allow the introduction of gaps in the alignment. There are penalties both for the creation of gaps and for the extension of existing ones; the default gap parameters given in alignment programs have been found to be empirically correct with test sequences but you should experiment with different gap penalties.


next up previous
Next: Global alignment Up: Dotplots Previous: Dotplots
EMBnet
2005-01-22