Next: Calculating Evolutionary Distance, Part
Up: Evolving Some Sequences on
Previous: Installing Seq-Gen
OK. Here you are going to use seq-gen to evolve your own protein and
amino acid data. You will compare the results of protein and amino
acid evolution evolving on the same evolutionary distances.
In order to evolve sequences for you, the first thing
seq-gen needs to do is make up an ancestor sequence. Seq-gen does this
by using certain assumptions that by default are determined by the
evolutionary model that you tell it to use. We'll explain this as we
go along.
After seq-gen makes an ancestor, it starts to evolve a number of
sequences according to a pattern and by an amount that you tell it
to. The way you tell it to do this is by giving it a ``tree'' to
evolve sequences by on the input.
For this exercise we are going to make seq-gen evolve two sequences
for equal amounts of evolutionary distance from a single ancestor,
which is the same as most of the examples we saw in the lecture.
The way we tell seq-gen to do this is by writing a very simple
``tree input.''
- Start a text-editor. Use pico from the
command-line to edit a file called tree.
% pico tree
- Write out the tree. Enter the following into the window:
(seq1:1,seq2:1);
Save your work and exit pico. You have
made a tree-file with two sequences, ``seq1'' and ``seq2'' which
will have each evolved by an equal evolutionary distance of 100% (each will
have been hit by as many substitutions on average as the
sequences are long). This distance of 100% is indicated by the
``1'' after each sequence-name and the colon character
:. These numbers are also called the ``branch-lengths.''
- Question 1: If each sequence is a distance of 1 from the
ancestor, how far away are the sequences from each other? Write
your answer on a fresh piece of paper or on a new document in
TextEdit. Please do that for the other questions below as well.
- Evolve amino acid sequences at a short distance. Now you are
going to input the tree-file tree to seq-gen with different
branch-lengths than the ``1'' you have in the file. You won't have
to change tree every time to change the branch-lengths
though. Instead, you can scale the tree by passing seq-gen a
number with option -s, with which seq-gen will multiply
every branch-length in the tree-file. So, if you execute
seq-gen -s 0.01, it is as if you put the following into
tree:
(seq1:0.01,seq2:0.01);
To tell seq-gen to evolve amino acid
sequences we'll specify an amino acid substitution model with the
model option -m. For now let's use the ``JTT'' model. Putting
this all together, try now running seq-gen for the first time, using
a branch-length of 0.002:
% ./seq-gen -mJTT -s 0.002 tree
You should see two very similar protein sequences pop out onto the
screen. Above the output there is a bunch of extra information about
how seq-gen did it's thing. How long are the sequences? The output
says that each sequence is 1000 amino acids long. We'll stick with
this sequence length for the rest of the exercise.
- Save your work to file so you can use it later. In UNIX, any
command that outputs to the screen can also very easily be made to
output to a file with a name that you give it. Usually only the
useful part is put in the file, and little messages like the extra
output aren't saved into the file, by the design of the
programmer. This is useful for us because we'll analyze our evolved
sequences with some other programs. Try saving your work into a file
called aa0002 with the following:
% ./seq-gen -mJTT -s 0.002 tree > aa0002
Now take a look at aa0002 to confirm it worked.
% cat aa0002
- Repeat this for other evolutionary distances. Try repeating
what you did above for a longer evolutionary distance of 0.005. It's
very easy to repeat commands in UNIX with slight modifications,
because the shell saves a history of your commands. You access this
list with <UP> the ``up-arrow'' key. Try it! Press it twice
and your last seq-gen command should appear. You can also go back
down through the history with the ``down-arrow'' key. Use the left
and right arrow-keys to bring the cursor just to the right of the
last "2'' and use the backspace key to erase it, then press ``5'' to
change it to a ``5.'' Do the same with the other ``2''. Now it
should look like:
% ./seq-gen -mJTT -s 0.005 tree > aa0005
Execute the command by pressing
<RETURN>.
Repeat this for the
following further branch lengths: 0.01,0.05,0.1,0.5,1, and
2. Name your output appropriately, for instance:
% ./seq-gen -mJTT -s 0.01 tree > aa001
I said before that the seq-gen makes
up an ancestor sequence according to what you tell it to do, and
to the assumptions. For protein sequences its default is to assume
that the ancestor has an amino acid composition equal to the
composition of the alignments that were used to make the JTT
matrix. We will see later that you can get this information from
the matrix itself.
- Repeat this for nucleic acid distances. Use history editing to
repeat what you did before but make nucleic acid sequences this
time. You will use the Jukes-Cantor model we heard about in
lecture. We can get this by specifying -mHKY. Seq-gen assumes
by default an ancestral sequence where each base has a frequency of
25%.
% ./seq-gen -mHKY -s 0.002 tree > nuc0002
And check your work with:
% cat nuc0002
Finally, start writing down a little table. Start each row with the
branch lengths you've used here. Also, use your answer to Question 1
to fill in a second column with the expected evolutionary distance
you should have in each file between seq1 and seq2.
Congrats. You are done with this section!
Next: Calculating Evolutionary Distance, Part
Up: Evolving Some Sequences on
Previous: Installing Seq-Gen
David Ardell
2005-01-26