EMBOSS: cusp

cusp

Function

Create a codon usage table

Description

Reads one or more coding sequences (CDS sequence only) and calculates a codon frequency table.

The output file can be used as a codon usage table in other applications.

Usage

Here is a sample session with cusp

This example uses only one input sequence. The normal use would be to use a set of coding sequences as the input.

% cusp -sbeg 135 -send 1292 Create a codon usage table Input sequence(s): tembl:paamir Output file [paamir.cusp]:

Go to the input files for this example
Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers:
  [-sequence]          seqall     Sequence database USA
  [-outfile]           outfile    Output file name

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers:
   -cfile              codon      Codon usage table name

   Associated qualifiers:

   "-sequence" associated qualifiers
   -sbegin1             integer    Start of each sequence to be used
   -send1               integer    End of each sequence to be used
   -sreverse1           boolean    Reverse (if DNA)
   -sask1               boolean    Ask for begin/end/reverse
   -snucleotide1        boolean    Sequence is nucleotide
   -sprotein1           boolean    Sequence is protein
   -slower1             boolean    Make lower case
   -supper1             boolean    Make upper case
   -sformat1            string     Input sequence format
   -sdbname1            string     Database name
   -sid1                string     Entryname
   -ufo1                string     UFO features
   -fformat1            string     Features format
   -fopenfile1          string     Features file name

   "-outfile" associated qualifiers
   -odirectory2         string     Output directory

   General qualifiers:
   -auto                boolean    Turn off prompts
   -stdout              boolean    Write standard output
   -filter              boolean    Read standard input, write standard output
   -options             boolean    Prompt for standard and additional values
   -debug               boolean    Write debug output to program.dbg
   -verbose             boolean    Report some/full command line options
   -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning             boolean    Report warnings
   -error               boolean    Report errors
   -fatal               boolean    Report fatal errors
   -die                 boolean    Report deaths

Standard (Mandatory) qualifiers Allowed values Default

[-sequence]
(Parameter 1) Sequence database USA Readable sequence(s) Required

[-outfile]
(Parameter 2) Output file name Output file <sequence>.cusp

Additional (Optional) qualifiers Allowed values Default

(none)

Advanced (Unprompted) qualifiers Allowed values Default

-cfile Codon usage table name Codon usage file in EMBOSS data path Ehum.cut

Standard (Mandatory) qualifiers	Allowed values	Default
[-sequence] (Parameter 1)	Sequence database USA	Readable sequence(s)	Required
[-outfile] (Parameter 2)	Output file name	Output file	<sequence>.cusp
Additional (Optional) qualifiers	Allowed values	Default
(none)
Advanced (Unprompted) qualifiers	Allowed values	Default
-cfile	Codon usage table name	Codon usage file in EMBOSS data path	Ehum.cut

Input file format

Input files for usage example

'tembl:paamir' is a sequence entry in the example nucleic acid database 'tembl'

Database entry: tembl:paamir

ID   PAAMIR     standard; DNA; PRO; 2167 BP.
XX
AC   X13776; M43175;
XX
SV   X13776.1
XX
DT   19-APR-1989 (Rel. 19, Created)
DT   17-FEB-1997 (Rel. 50, Last updated, Version 22)
XX
DE   Pseudomonas aeruginosa amiC and amiR gene for aliphatic amidase regulation
XX
KW   aliphatic amidase regulator; amiC gene; amiR gene.
XX
OS   Pseudomonas aeruginosa
OC   Bacteria; Proteobacteria; gamma subdivision; Pseudomonadaceae; Pseudomonas.
XX
RN   [1]
RP   1167-2167
RA   Rice P.M.;
RT   ;
RL   Submitted (16-DEC-1988) to the EMBL/GenBank/DDBJ databases.
RL   Rice P.M., EMBL, Postfach 10-2209, Meyerhofstrasse 1, 6900 Heidelberg, FRG.
XX
RN   [2]
RP   1167-2167
RX   MEDLINE; 89211409.
RA   Lowe N., Rice P.M., Drew R.E.;
RT   "Nucleotide sequence of the aliphatic amidase regulator gene of Pseudomonas
RT   aeruginosa";
RL   FEBS Lett. 246:39-43(1989).
XX
RN   [3]
RP   1-1292
RX   MEDLINE; 91317707.
RA   Wilson S., Drew R.;
RT   "Cloning and DNA seqence of amiC, a new gene regulating expression of the
RT   Pseudomonas aeruginosa aliphatic amidase, and purification of the amiC
RT   product.";
RL   J. Bacteriol. 173:4914-4921(1991).
XX
RN   [4]
RP   1-2167
RA   Rice P.M.;
RT   ;
RL   Submitted (04-SEP-1991) to the EMBL/GenBank/DDBJ databases.
RL   Rice P.M., EMBL, Postfach 10-2209, Meyerhofstrasse 1, 6900 Heidelberg, FRG.
XX
DR   SWISS-PROT; P10932; AMIR_PSEAE.
DR   SWISS-PROT; P27017; AMIC_PSEAE.
DR   SWISS-PROT; Q51417; AMIS_PSEAE.


  [Part of this file has been deleted for brevity]

FT                   phenotype"
FT                   /replace=""
FT                   /gene="amiC"
FT   misc_feature    1
FT                   /note="last base of an XhoI site"
FT   misc_feature    648..653
FT                   /note="end of 658bp XhoI fragment, deletion in  pSW3 causes
FT                   constitutive expression of amiE"
FT   conflict        1281
FT                   /replace="g"
FT                   /citation=[3]
XX
SQ   Sequence 2167 BP; 363 A; 712 C; 730 G; 362 T; 0 other;
     ggtaccgctg gccgagcatc tgctcgatca ccaccagccg ggcgacggga actgcacgat        60
     ctacctggcg agcctggagc acgagcgggt tcgcttcgta cggcgctgag cgacagtcac       120
     aggagaggaa acggatggga tcgcaccagg agcggccgct gatcggcctg ctgttctccg       180
     aaaccggcgt caccgccgat atcgagcgct cgcacgcgta tggcgcattg ctcgcggtcg       240
     agcaactgaa ccgcgagggc ggcgtcggcg gtcgcccgat cgaaacgctg tcccaggacc       300
     ccggcggcga cccggaccgc tatcggctgt gcgccgagga cttcattcgc aaccgggggg       360
     tacggttcct cgtgggctgc tacatgtcgc acacgcgcaa ggcggtgatg ccggtggtcg       420
     agcgcgccga cgcgctgctc tgctacccga ccccctacga gggcttcgag tattcgccga       480
     acatcgtcta cggcggtccg gcgccgaacc agaacagtgc gccgctggcg gcgtacctga       540
     ttcgccacta cggcgagcgg gtggtgttca tcggctcgga ctacatctat ccgcgggaaa       600
     gcaaccatgt gatgcgccac ctgtatcgcc agcacggcgg cacggtgctc gaggaaatct       660
     acattccgct gtatccctcc gacgacgact tgcagcgcgc cgtcgagcgc atctaccagg       720
     cgcgcgccga cgtggtcttc tccaccgtgg tgggcaccgg caccgccgag ctgtatcgcg       780
     ccatcgcccg tcgctacggc gacggcaggc ggccgccgat cgccagcctg accaccagcg       840
     aggcggaggt ggcgaagatg gagagtgacg tggcagaggg gcaggtggtg gtcgcgcctt       900
     acttctccag catcgatacg cccgccagcc gggccttcgt ccaggcctgc catggtttct       960
     tcccggagaa cgcgaccatc accgcctggg ccgaggcggc ctactggcag accttgttgc      1020
     tcggccgcgc cgcgcaggcc gcaggcaact ggcgggtgga agacgtgcag cggcacctgt      1080
     acgacatcga catcgacgcg ccacaggggc cggtccgggt ggagcgccag aacaaccaca      1140
     gccgcctgtc ttcgcgcatc gcggaaatcg atgcgcgcgg cgtgttccag gtccgctggc      1200
     agtcgcccga accgattcgc cccgaccctt atgtcgtcgt gcataacctc gacgactggt      1260
     ccgccagcat gggcggggga ccgctcccat gagcgccaac tcgctgctcg gcagcctgcg      1320
     cgagttgcag gtgctggtcc tcaacccgcc gggggaggtc agcgacgccc tggtcttgca      1380
     gctgatccgc atcggttgtt cggtgcgcca gtgctggccg ccgccggaag ccttcgacgt      1440
     gccggtggac gtggtcttca ccagcatttt ccagaatggc caccacgacg agatcgctgc      1500
     gctgctcgcc gccgggactc cgcgcactac cctggtggcg ctggtggagt acgaaagccc      1560
     cgcggtgctc tcgcagatca tcgagctgga gtgccacggc gtgatcaccc agccgctcga      1620
     tgcccaccgg gtgctgcctg tgctggtatc ggcgcggcgc atcagcgagg aaatggcgaa      1680
     gctgaagcag aagaccgagc agctccagga ccgcatcgcc ggccaggccc ggatcaacca      1740
     ggccaaggtg ttgctgatgc agcgccatgg ctgggacgag cgcgaggcgc accagcacct      1800
     gtcgcgggaa gcgatgaagc ggcgcgagcc gatcctgaag atcgctcagg agttgctggg      1860
     aaacgagccg tccgcctgag cgatccgggc cgaccagaac aataacaaga ggggtatcgt      1920
     catcatgctg ggactggttc tgctgtacgt tggcgcggtg ctgtttctca atgccgtctg      1980
     gttgctgggc aagatcagcg gtcgggaggt ggcggtgatc aacttcctgg tcggcgtgct      2040
     gagcgcctgc gtcgcgttct acctgatctt ttccgcagca gccgggcagg gctcgctgaa      2100
     ggccggagcg ctgaccctgc tattcgcttt tacctatctg tgggtggccg ccaaccagtt      2160
     cctcgag                                                                2167
//

Output file format

Output files for usage example

File: paamir.cusp

# CUSP codon usage file
# Codon	Amino acid	Fract   /1000	Number
GCA	A		0.077	7.772	3
GCC	A		0.462	46.632	18
GCG	A		0.462	46.632	18
GCT	A		0.000	0.000	0
TGC	C		1.000	10.363	4
TGT	C		0.000	0.000	0
GAC	D		0.864	49.223	19
GAT	D		0.136	7.772	3
GAA	E		0.269	18.135	7
GAG	E		0.731	49.223	19
TTC	F		1.000	28.497	11
TTT	F		0.000	0.000	0
GGA	G		0.062	5.181	2
GGC	G		0.719	59.585	23
GGG	G		0.125	10.363	4
GGT	G		0.094	7.772	3
CAC	H		0.727	20.725	8
CAT	H		0.273	7.772	3
ATA	I		0.000	0.000	0
ATC	I		0.800	41.451	16
ATT	I		0.200	10.363	4
AAA	K		0.000	0.000	0
AAG	K		1.000	5.181	2
CTA	L		0.000	0.000	0
CTC	L		0.269	18.135	7
CTG	L		0.577	38.860	15
CTT	L		0.000	0.000	0
TTA	L		0.000	0.000	0
TTG	L		0.154	10.363	4
ATG	M		1.000	15.544	6
AAC	N		1.000	28.497	11
AAT	N		0.000	0.000	0
CCA	P		0.074	5.181	2
CCC	P		0.222	15.544	6
CCG	P		0.630	44.041	17
CCT	P		0.074	5.181	2
CAA	Q		0.062	2.591	1
CAG	Q		0.938	38.860	15
AGA	R		0.000	0.000	0
AGG	R		0.029	2.591	1
CGA	R		0.000	0.000	0
CGC	R		0.629	56.995	22
CGG	R		0.314	28.497	11
CGT	R		0.029	2.591	1
AGC	S		0.304	18.135	7
AGT	S		0.087	5.181	2
TCA	S		0.000	0.000	0
TCC	S		0.261	15.544	6
TCG	S		0.304	18.135	7
TCT	S		0.043	2.591	1
ACA	T		0.000	0.000	0
ACC	T		0.733	28.497	11
ACG	T		0.267	10.363	4
ACT	T		0.000	0.000	0
GTA	V		0.030	2.591	1
GTC	V		0.394	33.679	13
GTG	V		0.576	49.223	19
GTT	V		0.000	0.000	0
TGG	W		1.000	12.953	5
TAC	Y		0.619	33.679	13
TAT	Y		0.381	20.725	8
TAA	*		0.000	0.000	0
TAG	*		0.000	0.000	0
TGA	*		1.000	2.591	1

The example usage read in a single CDS from Pseudomonas aeruginosa which has a very high GC content ands a strong coding bias, as shown by the codons for Alanine where those ending with G or C are used almost exclusively.

The 'Fract' column gives that proportion of usage of a given codon among its redundant set (i.e. the set of codons which code for this codon's amino acid). For example, the sum of the 6 codons representing serine will add up to 1.00.

The /1000 column represents the number of codons, given the input sequence(s), there are per 1000 bases. This will be an extrapolation if the sequence is shorter than 1000 bases.

If multiple sequences are input then the statistics are given for all of the sequences together, not individually.

Data files

cusp reads a codon usage file, but only as a template and does not use any of the data so any file will give the same results.

Notes

None.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

Always exits with status 0.

Known bugs

None.

Program name	Description
cai	CAI codon adaptation index
chips	Codon usage statistics
codcmp	Codon usage table comparison
syco	Synonymous codon usage Gribskov statistic plot

Author(s)

Alan Bleasby (ableasby © rfcgr.mrc.ac.uk)
MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK

History

Spring 2000 - written

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Function

Description

Usage

Command line arguments

Input file format

Input files for usage example

Database entry: tembl:paamir

Output file format

Output files for usage example

File: paamir.cusp

Data files

Notes

References

Warnings

Diagnostic Error Messages

Exit status

Known bugs

See also

Author(s)

History

Target users

Comments