Evolving some sequences

Next: Calculating Evolutionary Distance, Part Up: Evolving Some Sequences on Previous: Installing Seq-Gen

Evolving some sequences

OK. Here you are going to use seq-gen to evolve your own protein and amino acid data. You will compare the results of protein and amino acid evolution evolving on the same evolutionary distances.

In order to evolve sequences for you, the first thing seq-gen needs to do is make up an ancestor sequence. Seq-gen does this by using certain assumptions that by default are determined by the evolutionary model that you tell it to use. We'll explain this as we go along.

After seq-gen makes an ancestor, it starts to evolve a number of sequences according to a pattern and by an amount that you tell it to. The way you tell it to do this is by giving it a ``tree'' to evolve sequences by on the input.

For this exercise we are going to make seq-gen evolve two sequences for equal amounts of evolutionary distance from a single ancestor, which is the same as most of the examples we saw in the lecture.

The way we tell seq-gen to do this is by writing a very simple ``tree input.''

Start a text-editor. Use pico from the command-line to edit a file called tree.

% pico tree
Write out the tree. Enter the following into the window:

(seq1:1,seq2:1);

Save your work and exit pico. You have made a tree-file with two sequences, ``seq1'' and ``seq2'' which will have each evolved by an equal evolutionary distance of 100% (each will have been hit by as many substitutions on average as the sequences are long). This distance of 100% is indicated by the ``1'' after each sequence-name and the colon character :. These numbers are also called the ``branch-lengths.''
Question 1: If each sequence is a distance of 1 from the ancestor, how far away are the sequences from each other? Write your answer on a fresh piece of paper or on a new document in TextEdit. Please do that for the other questions below as well.
Evolve amino acid sequences at a short distance. Now you are going to input the tree-file tree to seq-gen with different branch-lengths than the ``1'' you have in the file. You won't have to change tree every time to change the branch-lengths though. Instead, you can scale the tree by passing seq-gen a number with option -s, with which seq-gen will multiply every branch-length in the tree-file. So, if you execute seq-gen -s 0.01, it is as if you put the following into tree:

(seq1:0.01,seq2:0.01);

To tell seq-gen to evolve amino acid sequences we'll specify an amino acid substitution model with the model option -m. For now let's use the ``JTT'' model. Putting this all together, try now running seq-gen for the first time, using a branch-length of 0.002:

% ./seq-gen -mJTT -s 0.002 tree

You should see two very similar protein sequences pop out onto the screen. Above the output there is a bunch of extra information about how seq-gen did it's thing. How long are the sequences? The output says that each sequence is 1000 amino acids long. We'll stick with this sequence length for the rest of the exercise.
Save your work to file so you can use it later. In UNIX, any command that outputs to the screen can also very easily be made to output to a file with a name that you give it. Usually only the useful part is put in the file, and little messages like the extra output aren't saved into the file, by the design of the programmer. This is useful for us because we'll analyze our evolved sequences with some other programs. Try saving your work into a file called aa0002 with the following:

% ./seq-gen -mJTT -s 0.002 tree > aa0002

Now take a look at aa0002 to confirm it worked.

% cat aa0002
Repeat this for other evolutionary distances. Try repeating what you did above for a longer evolutionary distance of 0.005. It's very easy to repeat commands in UNIX with slight modifications, because the shell saves a history of your commands. You access this list with <UP> the ``up-arrow'' key. Try it! Press it twice and your last seq-gen command should appear. You can also go back down through the history with the ``down-arrow'' key. Use the left and right arrow-keys to bring the cursor just to the right of the last "2'' and use the backspace key to erase it, then press ``5'' to change it to a ``5.'' Do the same with the other ``2''. Now it should look like:

% ./seq-gen -mJTT -s 0.005 tree > aa0005

Execute the command by pressing <RETURN>.

Repeat this for the following further branch lengths: 0.01,0.05,0.1,0.5,1, and 2. Name your output appropriately, for instance:

% ./seq-gen -mJTT -s 0.01 tree > aa001

I said before that the seq-gen makes up an ancestor sequence according to what you tell it to do, and to the assumptions. For protein sequences its default is to assume that the ancestor has an amino acid composition equal to the composition of the alignments that were used to make the JTT matrix. We will see later that you can get this information from the matrix itself.
Repeat this for nucleic acid distances. Use history editing to repeat what you did before but make nucleic acid sequences this time. You will use the Jukes-Cantor model we heard about in lecture. We can get this by specifying -mHKY. Seq-gen assumes by default an ancestral sequence where each base has a frequency of 25%.

% ./seq-gen -mHKY -s 0.002 tree > nuc0002

And check your work with:

% cat nuc0002

Finally, start writing down a little table. Start each row with the branch lengths you've used here. Also, use your answer to Question 1 to fill in a second column with the expected evolutionary distance you should have in each file between seq1 and seq2.

Congrats. You are done with this section!

Next: Calculating Evolutionary Distance, Part Up: Evolving Some Sequences on Previous: Installing Seq-Gen

David Ardell 2005-01-26