Calculating % Change

Next: Using the Poisson Correction Up: Calculating Evolutionary Distance, Part Previous: Calculating Evolutionary Distance, Part

Calculating % Change

The simplest measure of evolutionary distance, used by many protein scientists is the so-called uncorrected evolutionary distance or ``p-distance.''This is often expressed as ``percent change'' or as ``percent identity (%ID)'' which is (100 - percent change). You can compute percent change in EMBOSS with a program called ``infoalign''.

Compute % Change of the nucleic acid sequences. Try the following command:

% infoalign nuc0002

It will prompt you for an output filename, just use the default by pressing return. Check the output with:

% cat nuc0002.infoalign

You'll see a bunch of statistics, which make sense when you remember the sequence length is 1000. The % Change is in the last column of the output. How does it compare to the expected distance you wrote down in your table? Make a new column in the table and write down the observed value. Now use the history function of the shell to compute this for all of the nucleic acid alignments. After you make all the output files, use cat to look at them and fill in your % Change values in your table.
Repeat for amino acid sequences. Do the same as above for the amino acid sequences. The only difference is that the two lines aren't identical anymore in the output of infoalign. Just use the bottom line to compute your % Changes for the amino acid sequences. Fill these in a 4th column of your table.
Question 2: Where do you start seeing signs of saturation in the % Change statistics? Where do the values differ most from the true evolutionary distance? Does saturation happen sooner in proteins or DNA? Please write an explanation as to why you think this is.
Real data. Take the multiple alignment you created yesterday and compute the % Change between sequences 1 and 2, OPS2_DROME and OPS2_DROPS, which are orthologs from two closely-related species of flies Drosophila melanogaster and D. pseudoobscura. You do this by specifying the first sequence in the alignment as a ``reference'' sequence to compare the other one against. Otherwise infoalign uses the ``consensus'' of the sequences (we'll discuss this later).

% infoalign -refseq 1 ops2_drome.aln

Use the default output name by pressing return. Check the output with:

% cat ops2_drome.infoalign

What is the % Change of the first two sequences? Put this into a new last row on your table.

Now we'll take a look at the Poisson correction.

Next: Using the Poisson Correction Up: Calculating Evolutionary Distance, Part Previous: Calculating Evolutionary Distance, Part

David Ardell 2005-01-26