Next: About this document ...
Up: Phylogenetic trees
Previous: Likelihood versus Parsimony at
Phylip is a terrific package for doing different phylogenetic
tasks. Its documentation is also a wonderful learning resource for
making and working with trees. It is highly recommended that you look
there when you have questions about how things work. The documentation
is available directly on the net.
One problem with the phylip packages are that they are manu-driven at
the command line. They like their data to be in files called
infile and the put their out put into files called outfile and
treefile. This means that you have to rename files a lot when you
use phylip. But you'll learn some tricks to make this easier. The
programs also come with many options, often sensible defaults, but the
documentation really helps you understand what you are doing.
The EMBOSS versions of the phylip programs make some of this
filenaming stuff easier. Unfortunately they don't always work as well
as the Phylip ones. So we are going to install some of those too.
- Download the source code and documentation for phylip from
the Phylip webpage.
- Compile a few of the programs You don't need to install
all of the programs just a few: seqboot, protdist, neighbor and
consense. Change directory into the src directory inside the
unpacked phylip directory phylip-3.6. Then say:
% make seqboot
% make protdist
% make neighbor
% make consense
You're going to use these programs in order to make a bootstrapped
neighbor-joining tree. The first one bootstraps the data: it makes a
large number of ``resampled'' alignments from your alignment (in this
case the 16S_manual.plp). The second one makes distance matrices from
each alignment. neighbor does the neighbor-joining which you
camn read more about here. And consense puts each tree together
and counts the number of times each group of taxa is put together. In
this way you get a statistical feeling for the confidence by which
each grouping or clade goes together from the data.
Move these 4 executables into the data directory where 16S_manual.plp
is.
- Run the programs In the order above. For each one,
starting from 16S_manual.plp, you have to mv every input
file to have the name infile, except for consense which
is looking for a file called treefile.So you can say for instance,
% cp 16S_manual.plp infile
% seqboot
Please, in order to understand what you are doing, after every run,
examine the output of the program:
% ls -lrt
% less outfile
What does the seqboot output look like?
Then get ready for the next run:
% mv outfile infile
% protdist
and so on. When you run seqboot, use the default of 100
replicates. You have to tell protdist and neighbor to
analyze multiple datasets (100).
If you have been successful, after running consense look at
the output of the outfile. You should see numbers that range
from 1 to 100, and each one tells you how much confidence you have
in each part of the tree. You can't publish a paper in a serious
journal without having made numbers like these!
So now you are ready for primetime.
Next: About this document ...
Up: Phylogenetic trees
Previous: Likelihood versus Parsimony at
David Ardell
2005-01-28