Next: Using the Poisson Correction
Up: Calculating Evolutionary Distance, Part
Previous: Calculating Evolutionary Distance, Part
The simplest measure of evolutionary distance, used by many protein
scientists is the so-called uncorrected evolutionary distance or
``p-distance.''This is often expressed as ``percent change'' or as
``percent identity (%ID)'' which is (100 - percent change). You
can compute percent change in EMBOSS with a program called ``infoalign''.
- Compute % Change of the nucleic acid sequences. Try the
following command:
% infoalign nuc0002
It will
prompt you for an output filename, just use the default by
pressing return. Check the output with:
% cat nuc0002.infoalign
You'll see a bunch of statistics,
which make sense when you remember the sequence length is
1000. The % Change is in the last column of the output. How does it
compare to the expected distance you wrote down in your table?
Make a new column in the table and write down the observed value.
Now use the history function of the shell to compute this for all
of the nucleic acid alignments. After you make all the output
files, use cat to look at them and fill in your % Change values
in your table.
- Repeat for amino acid sequences. Do the same as above
for the amino acid sequences. The only difference is that the two
lines aren't identical anymore in the output of
infoalign. Just use the bottom line to compute your % Changes for
the amino acid sequences. Fill these in a 4th column of your table.
- Question 2: Where do you start seeing signs of
saturation in the % Change statistics? Where do the values differ
most from the true evolutionary
distance? Does saturation happen sooner in proteins or DNA? Please
write an explanation as to why you think this is.
- Real data. Take the multiple alignment you created
yesterday and compute the % Change between sequences 1 and 2,
OPS2_DROME and OPS2_DROPS, which are orthologs from two
closely-related species of flies Drosophila melanogaster and
D. pseudoobscura. You do this by specifying the first
sequence in the alignment as a ``reference'' sequence to compare
the other one against. Otherwise infoalign uses the ``consensus''
of the sequences (we'll discuss this later).
% infoalign -refseq 1 ops2_drome.aln
Use the default output name by
pressing return. Check the output with:
% cat ops2_drome.infoalign
What is the % Change of the first two
sequences? Put this into a new last row on your table.
Now we'll take a look at the Poisson correction.
Next: Using the Poisson Correction
Up: Calculating Evolutionary Distance, Part
Previous: Calculating Evolutionary Distance, Part
David Ardell
2005-01-26