next up previous
Next: Exercise: prettyplot Up: Multiple Sequence Analysis Previous: Exercise: Retrieving a set

Exercise: emma

unix % emma
Multiple alignment program - interface to ClustalW program
Input sequence: OPS2_*.fasta
Output sequence [OPS2_*drome.aln]: OPS2_*.aln
Output file [OPS2_*drome.dnd]: OPS2_*.dnd
..clustalw -infile=21665A -outfile=21665B -align
-type=protein -output=gcg -pwmatrix=blosum -pwgapopen=10.000
-pwgapext=0.100 -newtree=21665C -matrix=blosum -gapopen=10.000
-gapext=5.000 -gapdist=8 -hgapresidues=GPSNDQEKR -maxdiv=30..

CLUSTAL W (1.74) Multiple Sequence Alignments

Sequence type explicitly set to Protein
Sequence format is Pearson
Sequence 1: OPS2_*DROME 381 aa
Sequence 2: OPS2_*DROPS 381 aa
Sequence 3: OPS2_*HEMSA 377 aa
Sequence 4: OPS2_*LIMPO 376 aa
Sequence 5: OPS2_*PATYE 399 aa
Sequence 6: OPS2_*SCHGR 380 aa
Start of Pairwise alignments
Aligning...
Sequences (1:2) Aligned. Score: 91
Sequences (1:3) Aligned. Score: 37
Sequences (1:4) Aligned. Score: 48
Sequences (1:5) Aligned. Score: 20
Sequences (1:6) Aligned. Score: 32
Sequences (2:3) Aligned. Score: 37
Sequences (2:4) Aligned. Score: 48
Sequences (2:5) Aligned. Score: 22
Sequences (2:6) Aligned. Score: 31
Sequences (3:4) Aligned. Score: 40
Sequences (3:5) Aligned. Score: 23
Sequences (3:6) Aligned. Score: 32
Sequences (4:5) Aligned. Score: 20
Sequences (4:6) Aligned. Score: 34
Sequences (5:6) Aligned. Score: 18
Guide tree file created: [21665C]
Start of Multiple Alignment
There are 5 groups
Aligning...
Group 1: Sequences: 2 Score:6084
Group 2: Sequences: 3 Score:3046
Group 3: Sequences: 4 Score:2772
Group 4: Sequences: 5 Score:2489
Group 5: Delayed
Sequence:5 Score:2819
Alignment Score 11778
GCG-Alignment file created [21665B]

We have aligned OPS2_* sequences from two fruit fly species, two crab species, locust and scallop. Let's see what emma made of them:

unix % more OPS2_*.aln

>OPS2_*DROME
MERSHLPETPFDLAHSGPRFQAQSSGNGSVLD-NVLPDMAHLVNPYWSRFAPMDPMMSKI
LGLFTLAIMIISCCGNGVVVYIFGGTKSLRTPANLLVLNLAFSDFCMMASQSPVMIINFY
Y-ETWVLGPLWCDIYAGCGSLFGCVSIWSMCMIAFDRYNVIVKGINGTPMTIKTSIMKIL
FIWMMAVFWTVMPLIGWSAYVPEGNLTACSIDYMTRMWNPRSYLITYSLFVYYTPLFLIC
YSYWFIIAAVAAHEKAMREQAKKMNVKSLRSSEDCDK-SAEGKLAKVALTTISLWFMAWT
PYLVICYFGLFKIDG-LTPLTTIWGATFAKTSAVYNPIVYGISHPKYRIVLKEKCPMCVF
GNTDEPKPDAPASDTETTSEADSKA-----------------------------------
---------------------------
>OPS2_*DROPS
MERSLLPEPPLAMALLGPRFEAQTGGNRSVLD-NVLPDMAPLVNPHWSRFAPMDPTMSKI
LGLFTLVILIISCCGNGVVVYIFGGTKSLRTPANLLVLNLAFSDFCMMASQSPVMIINFY
Y-ETWVLGPLWCDIYAACGSLFGCVSIWSMCMIAFDRYNVIVKGINGTPMTIKTSIMKIA
FIWMMAVFWTIMPLIGWSSYVPEGNLTACSIDYMTRQWNPRSYLITYSLFVYYTPLFMIC
YSYWFIIATVAAHEKAMRDQAKKMNVKSLRSSEDCDK-SAENKLAKVALTTISLWFMAWT
PYLIICYFGLFKIDG-LTPLTTIWGATFAKTSAVYNPIVYGISHPNDRLVLKEKCPMCVC
GTTDEPKPDAPPSDTETTSEAESKD-----------------------------------
---------------------------
>OPS2_*LIMPO
----------MANQLSYSSLGWPYQPNASVVD-TMPKEMLYMIHEHWYAFPPMNPLWYSI
LGVAMIILGIICVLGNGMVIYLMMTTKSLRTPTNLLVVNLAFSDFCMMAFMMPTMASNCF
A-ETWILGPFMCEVYGMAGSLFGCASIWSMVMITLDRYNVIVRGMAAAPLTHKKATLLLL
FVWIWSGGWTILPFFGWSRYVPEGNLTSCTVDYLTKDWSSASYVIIYGLAVYFLPLITMI



YCYFFIVHAVAEHEKQLREQAKKMNVASLRANADQQKQSAECRLAKVAMMTVGLWFMAWT
PYLIIAWAGVFSSGTRLTPLATIWGSVFAKANSCYNPIVYGISHPRYKAALYQRFPSLAC
GSGESGSDVKSEASATMTMEEKPKSPEA--------------------------------
---------------------------
>OPS2_*HEMSA
---MTNATGPQMAYYGAASMDFGYPEGVSIVD-FVRPEIKPYVHQHWYNYPPVNPMWHYL
LGVIYLFLGTVSIFGNGLVIYLFNKSAALRTPANILVVNLALSDLIMLTTNVPFFTYNCF
SGGVWMFSPQYCEIYACLGAITGVCSIWLLCMISFDRYNIICNGFNGPKLTTGKAVVFAL
ISWVIAIGCALPPFFGWGNYILEGILDSCSYDYLTQDFNTFSYNIFIFVFDYFLPAAIIV
FSYVFIVKAIFAHEAAMRAQAKKMNVSTLRSNEADAQ-RAEIRIAKTALVNVSLWFICWT
PYALISLKGVMGDTSGITPLVSTLPALLAKSCSCYNPFVYAISHPKYRLAITQHLPWFCV
HETETKSNDDSQSNSTVAQDKA--------------------------------------
---------------------------
>OPS2_*SCHGR
------MVNTTDFYPVPAAMAYESSVGLPLLGWNVPTEHLDLVHPHWRSFQVPNKYWHFG
LAFVYFMLMCMSSLGNGIVLWIYATTKSIRTPSNMFIVNLALFDVLMLLEMPMLVVSSLF
Y-QRPVGWELGCDIYAALGSVAGIGSAINNAAIAFDRYRTISCPIDGRLTQGQVLALIAG
TWVWTLPFTLMPLLRIWSRFTAEGFLTTCSFDYLTDDEDTKVFVGCIFAWSYAFPLCLIC
CFYYRLIGAVREHEKMLRDQAKKMNVKSLQSNADTEAQSAEIRIAKVALTIFFLFLCSWT
PYAVVAMIGAFGNRAALTPLSTMIPAVTAKIVSCIDPWVYAINHPRFRAEVQKRMKWLHL
GEDARSSKSDTSSTATDRTVGNVSASA---------------------------------
---------------------------
>OPS2_*PATYE
---------------------------------------MPFPLNRTDTALVISPSEFRI
IGIFISICCIIGVLGNLLIIIVFAKRRSVRRPINFFVLNLAVSDLIVALLGYPMTAASAF
S-NRWIFDNIGCKIYAFLCFNSGVISIMTHAALSFCRYIIICQYGYRKKITQTTVLRTLF
SIWSFAMFWTLSPLFGWSSYVIEVVPVSCSVNWYGHGLGDVSYTISVIVAVYVFPLSIIV
FSYGMIL-----QEKVCKDSRKNGIRAQQRYTPRFIQ-DIEQRVTFISFLMMAAFMVAWT
PYAIMSALAIGSFNV--ENSFAALPTLFAKASCAYNPFIYAFTNANFRDTVVEIMAPWTT
RRVGVSTLPWPQVTYYPRRRTSAVNTTDIEFPDDNIFIVNSSVNGPTVKREKIVQRNPIN
VRLGIKIEPRDSRAATENTFTADFSVI

The sequences are very similar, but there are some differences - note the gaps that have been inserted. Also note that since this is a global alignment algorithm, gaps have been inserted to make all the sequences the same length.

Differences in alignment can be very difficult to see in this format. The program Prettyplot can enhance visualisation of your results, by aligning the sequences on top of one another.


next up previous
Next: Exercise: prettyplot Up: Multiple Sequence Analysis Previous: Exercise: Retrieving a set
EMBnet
2005-01-22