cutgextract

 

Function

Extract data from CUTG

Description

given the name of a directory containing the CUTG database (ftp://ftp.ebi.ac.uk/pub/databases/cutg) this will calculate codon usage tables for individual species (e.g. EHomo_sapiens.cut) and place them in the CODONS subdirectory of the EMBOSS data directory. This is an all-or-nothing extraction, will create many files and take several minutes. The usage tables are from the sum of codons over all sequences for each organism. Given the name of a directory containing the CUTG database (ftp://ftp.ebi.ac.uk/pub/databases/cutg) cutgextract will calculate codon usage tables for individual species (e.g. EHomo_sapiens.cut) and place them in the CODONS subdirectory of the EMBOSS data directory. This is an all-or-nothing extraction, will create many files and take several minutes. The usage tables are from the sum of codons over all sequences for each organism.

The EMBOSS distribution comes loaded with a set of codon usage tables. Thes codon usage tables provided with the distribution are calculated from the files in ftp://ftp.ebi.ac.uk/pub/databases/codonusage/README), with a few additions whose exact derivation cannot easily be determined. Many people would prefer to create their own from the public CUTG data.

You run cutgextract on the CUTG database from ftp://ftp.ebi.ac.uk/pub/databases/cutg. You should get all the required *.codon files from CUTG, and uncompress them if they are compressed before running cutgextract on them.

The task of downloading the CUTG database and running cutgextract to create the codon usage table files from it would normally be done only once when the EMBOSS package is being installled or if a new version of the CUTG database is released.

Note by the way that CUTG has a drawback: it has a table for each organism without making the distinction between different gene populations.

Algorithm

cutgextract looks in the specified directory and opens all the files with the extension '.codon'. These are all expected to be CUTG data files.

It then parses out the codon usage data from these *.codon files and writes one file per species into the EMBOSS data/CODONS directory. The names of the files are derived from the species names in the CUTG files. These files names will be long (and therefore descriptive).

Usage

Here is a sample session with cutgextract


% cutgextract 
Extract data from CUTG
CUTG directory [.]: ../../data

Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers:
  [-directory]         dirlist    CUTG directory

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers:
   -wildspec           string     Type of codon file

   Associated qualifiers: (none)
   General qualifiers:
   -auto                boolean    Turn off prompts
   -stdout              boolean    Write standard output
   -filter              boolean    Read standard input, write standard output
   -options             boolean    Prompt for standard and additional values
   -debug               boolean    Write debug output to program.dbg
   -verbose             boolean    Report some/full command line options
   -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning             boolean    Report warnings
   -error               boolean    Report errors
   -fatal               boolean    Report fatal errors
   -die                 boolean    Report deaths


Error: File /packages/emboss_dev/gwilliam/emboss/emboss/emboss/acd/cutgextract.acd line 11: (directory) Unknown attribute 'name'

Input file format

Output file format

cutgextract outputs a set of EMBOSS codon usage data files to the EMBOSS data/CODONS data directory

Output files for usage example

File: CODONS


Data files

EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA.

To see the available EMBOSS data files, run:

% embossdata -showall

To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:


% embossdata -fetch -file Exxx.dat

Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".

The directories are searched in the following order:

Notes

None.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

See also

Program nameDescription
aaindexextractExtract data from AAINDEX
printsextractExtract data from PRINTS
prosextractBuilds the PROSITE motif database for patmatmotifs to search
rebaseextractExtract data from REBASE
tfextractExtract data from TRANSFAC

Author(s)

Alan Bleasby (ableasby © rfcgr.mrc.ac.uk)
MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK

History

Written (June 2001) - Alan Bleasby.

Target users

This program is intended to be run by people maintaining the data associated with an installation of EMBOSS.