EMBOSS: cutgextract

cutgextract

Function

Extract data from CUTG

Description

given the name of a directory containing the CUTG database (ftp://ftp.ebi.ac.uk/pub/databases/cutg) this will calculate codon usage tables for individual species (e.g. EHomo_sapiens.cut) and place them in the CODONS subdirectory of the EMBOSS data directory. This is an all-or-nothing extraction, will create many files and take several minutes. The usage tables are from the sum of codons over all sequences for each organism. Given the name of a directory containing the CUTG database (ftp://ftp.ebi.ac.uk/pub/databases/cutg) cutgextract will calculate codon usage tables for individual species (e.g. EHomo_sapiens.cut) and place them in the CODONS subdirectory of the EMBOSS data directory. This is an all-or-nothing extraction, will create many files and take several minutes. The usage tables are from the sum of codons over all sequences for each organism.

The EMBOSS distribution comes loaded with a set of codon usage tables. Thes codon usage tables provided with the distribution are calculated from the files in ftp://ftp.ebi.ac.uk/pub/databases/codonusage/README), with a few additions whose exact derivation cannot easily be determined. Many people would prefer to create their own from the public CUTG data.

You run cutgextract on the CUTG database from ftp://ftp.ebi.ac.uk/pub/databases/cutg. You should get all the required *.codon files from CUTG, and uncompress them if they are compressed before running cutgextract on them.

The task of downloading the CUTG database and running cutgextract to create the codon usage table files from it would normally be done only once when the EMBOSS package is being installled or if a new version of the CUTG database is released.

Note by the way that CUTG has a drawback: it has a table for each organism without making the distinction between different gene populations.

Algorithm

cutgextract looks in the specified directory and opens all the files with the extension '.codon'. These are all expected to be CUTG data files.

It then parses out the codon usage data from these *.codon files and writes one file per species into the EMBOSS data/CODONS directory. The names of the files are derived from the species names in the CUTG files. These files names will be long (and therefore descriptive).

Usage

Here is a sample session with cutgextract

% cutgextract Extract data from CUTG CUTG directory [.]: ../../data

Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers:
  [-directory]         dirlist    CUTG directory

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers:
   -wildspec           string     Type of codon file

   Associated qualifiers: (none)
   General qualifiers:
   -auto                boolean    Turn off prompts
   -stdout              boolean    Write standard output
   -filter              boolean    Read standard input, write standard output
   -options             boolean    Prompt for standard and additional values
   -debug               boolean    Write debug output to program.dbg
   -verbose             boolean    Report some/full command line options
   -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning             boolean    Report warnings
   -error               boolean    Report errors
   -fatal               boolean    Report fatal errors
   -die                 boolean    Report deaths

Error: File /packages/emboss_dev/gwilliam/emboss/emboss/emboss/acd/cutgextract.acd line 11: (directory) Unknown attribute 'name'
Input file format
Output file format
cutgextract outputs a set of EMBOSS codon usage data files to the EMBOSS data/CODONS data directory

Output files for usage example
File: CODONS

Data files

EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA.

To see the available EMBOSS data files, run:

% embossdata -showall

To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:


% embossdata -fetch -file Exxx.dat

Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".

The directories are searched in the following order:

. (your current directory)
.embossdata (under your current directory)
~/ (your home directory)
~/.embossdata

Notes

None.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

Program name	Description
aaindexextract	Extract data from AAINDEX
printsextract	Extract data from PRINTS
prosextract	Builds the PROSITE motif database for patmatmotifs to search
rebaseextract	Extract data from REBASE
tfextract	Extract data from TRANSFAC

Author(s)

Alan Bleasby (ableasby © rfcgr.mrc.ac.uk)
MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK

History

Written (June 2001) - Alan Bleasby.

Target users

This program is intended to be run by people maintaining the data associated with an installation of EMBOSS.

Function

Description

Algorithm

Usage

Command line arguments

Input file format

Output file format

Output files for usage example

File: CODONS

Data files

Notes

References

Warnings

Diagnostic Error Messages

Exit status

Known bugs

See also

Author(s)

History

Target users