Next: Pairwise sequence alignment Up: Working with sequences Previous: Using multiple sequences

Listfiles

It is also possible to use list files within EMBOSS. Instead of containing the sequences themselves, a list file contains "references" to sequences - so, for example, you might include database entries, the names of files containing sequences, or even the names of other list files. For example, here's a valid list file, called seq.list:

unix % more seq.list

opsd_abyko.fasta
tsw:opsd_xenla
tsw:opsd_c*
@another_list

This looks a bit odd, but it's really very straightforward; the file contains:

opsd_abyko.fasta - this is the name of a sequence file. The file is read in from the current directory.
tsw:opsd_xenla - this is a reference to a specific sequence in the SwissProt database
tsw:opsd_c* - this represents all the sequences in SwissProt whose identifiers start with ``opsd_c''
another_list - this is the name of a second list file

Notice the @ in front of the last entry. This is the way you tell EMBOSS that this file is a list file, not a regular sequence file. Let's demonstrate this by using this file as the input to seqret and get the sequences into a new file, perhaps for use in a multiple sequence alignment (see Section 5.3). You'll need to use a text editor such as pico to create the appropriate sequence and list files if you'd like to try this yourself.

First of all, we'll make the file opsd_abyko.fasta using seqret:

unix % seqret tsw:opsd_abyko -outseq opsd_abyko.fasta

Now let's look at another_list. Note that its structure is very similar to that of seq.list but this time only contains database references:

unix % more another_list

tsw:opsd_anoca
tsw:opsd_apime
tsw:opsd_astfa

Finally, let's run seqret with seq.list (not forgetting the @ sign) and look at the results:

unix % seqret @seq.list -outseq outfile

unix % more outfile

>OPSD_ABYKO O42294 RHODOPSIN (FRAGMENT).
YLVNPAAYAALGAYMFLLILIGFPINFLTLYVTLEHKKLRTPLNYILLNLAVANLFMVLG
GFTTTMYTSMHGYFVLGRLGCNLEAFFATLGGEIALWSLVVLAIERWIVVCKPISNFRFT
EDHAIMGLAFTWVMALACAVPPLVGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVIYMFI
VHFLIPLSVIFFCYGRLLCAVKEAPAAQQESETTQRAEKEVSRMVVIMVIGFLVCWLPYA
SVAWWIFCNQGSDFGPIFMTLPSFFAKSAAIYNPMIYICMNKQFRHCMI
>OPSD_XENLA P29403 RHODOPSIN.
MNGTEGPNFYVPMSNKTGVVRSPFDYPQYYLAEPWQYSALAAYMFLLILLGLPINFMTLF
VTIQHKKLRTPLNYILLNLVFANHFMVLCGFTVTMYTSMHGYFIFGPTGCYIEGFFATLG
GEVALWSLVVLAVERYIVVCKPMANFRFGENHAIMGVAFTWIMALSCAAPPLFGWSRYIP
EGMQCSCGVDYYTLKPEVNNESFVIYMFIVHFTIPLIVIFFCYGRLLCTVKEAAAQQQES
LTTQKAEKEVTRMVVIMVVFFLICWVPYAYVAFYIFTHQGSNFGPVFMTVPAFFAKSSAI
YNPVIYIVLNKQFRNCLITTLCCGKNPFGDEDGSSAATSKTEASSVSSSQVSPA
>OPSD_CAMAB Q17292 RHODOPSIN.
MMSIASGPSHAAYTWASQGGGFGNQTVVDKVPPEMLHMVDAHWYQFPPMNPLWHALLGFV
IGVLGVISVIGNGMVIYIFTTTKSLRTPSNLLVVNLAISDFLMMLCMSPAMVINCYYETW
VLGPLFCELYGLAGSLFGCASIWTMTMIAFDRYNVIVKGLSAKPMTINGALIRILTIWFF
TLAWTIAPMFGWNRYVPEGNMTACGTDYLTKDLFSRSYILIYSIFVYFTPLFLIIYSYFF
IIQAVAAHEKNMREQAKKMNVASLRSAENQSTSAECKLAKVALMTISLWFMAWTPYLVIN
YSGIFETTKISPLFTIWGSLFAKANAVYNPIVYGISHPKYRAALFQKFPSLACTTEPTGA
DTMSTTTTVTEGNEKPAA
>OPSD_CAMHU O18312 RHODOPSIN (FRAGMENT).
LHMIHLHWYQYPPMNPMMYPLLLIFMLFTGILCLAGNFVTIWVFMNTKSLRTPANLLVVN
LAMSDFLMMFTMFPPMMVTCYYHTWTLGPTFCQVYAFLGNLCGCASIWTMVFITFDRYNV
IVKGVAGEPLSTKKASLWILSVWVLSTAWCIAPFFGWNHYVPEGNLTGCGTDYLSEDILS
RSYLYIYSTWVYFLPLAITIYCYVFIIKAVAAHEKGMRDQAKKMGIKSLRNEEAQKTSAE
CRLAKNAMTTVALWFIAWTPCLLINWVGMFARSYLSPVYTIWGYVFAKANAVYNPIVYAI
S

$\vdots$

Note that the output file contains all the sequences we specified in seq.list, as we had expected.

Next: Pairwise sequence alignment Up: Working with sequences Previous: Using multiple sequences

EMBnet
2005-01-22