Next: About this document ... Up: Phylogenetic trees Previous: Likelihood versus Parsimony at

Bootstrapping a Neighbor-Joining tree

Phylip is a terrific package for doing different phylogenetic tasks. Its documentation is also a wonderful learning resource for making and working with trees. It is highly recommended that you look there when you have questions about how things work. The documentation is available directly on the net.

One problem with the phylip packages are that they are manu-driven at the command line. They like their data to be in files called infile and the put their out put into files called outfile and treefile. This means that you have to rename files a lot when you use phylip. But you'll learn some tricks to make this easier. The programs also come with many options, often sensible defaults, but the documentation really helps you understand what you are doing.

The EMBOSS versions of the phylip programs make some of this filenaming stuff easier. Unfortunately they don't always work as well as the Phylip ones. So we are going to install some of those too.

Download the source code and documentation for phylip from the Phylip webpage.
Compile a few of the programs You don't need to install all of the programs just a few: seqboot, protdist, neighbor and consense. Change directory into the src directory inside the unpacked phylip directory phylip-3.6. Then say:

% make seqboot
% make protdist
% make neighbor
% make consense

You're going to use these programs in order to make a bootstrapped neighbor-joining tree. The first one bootstraps the data: it makes a large number of ``resampled'' alignments from your alignment (in this case the 16S_manual.plp). The second one makes distance matrices from each alignment. neighbor does the neighbor-joining which you camn read more about here. And consense puts each tree together and counts the number of times each group of taxa is put together. In this way you get a statistical feeling for the confidence by which each grouping or clade goes together from the data.

Move these 4 executables into the data directory where 16S_manual.plp is.
Run the programs In the order above. For each one, starting from 16S_manual.plp, you have to mv every input file to have the name infile, except for consense which is looking for a file called treefile.So you can say for instance,

% cp 16S_manual.plp infile
% seqboot

Please, in order to understand what you are doing, after every run, examine the output of the program:

% ls -lrt
% less outfile

What does the seqboot output look like?

Then get ready for the next run:

% mv outfile infile

% protdist

and so on. When you run seqboot, use the default of 100 replicates. You have to tell protdist and neighbor to analyze multiple datasets (100).

If you have been successful, after running consense look at the output of the outfile. You should see numbers that range from 1 to 100, and each one tells you how much confidence you have in each part of the tree. You can't publish a paper in a serious journal without having made numbers like these!

So now you are ready for primetime.

Next: About this document ... Up: Phylogenetic trees Previous: Likelihood versus Parsimony at

David Ardell 2005-01-28