rebaseextract

 

Function

Extract data from REBASE

Description

The Restriction Enzyme database (REBASE) is a collection of information about restriction enzymes and related proteins. It contains published and unpublished references, recognition and cleavage sites, isoschizomers, commercial availability, methylation sensitivity, crystal and sequence data. DNA methyltransferases, homing endonucleases, nicking enzymes, specificity subunits and control proteins are also included. Most recently, putative DNA methyltransferases and restriction enzymes, as predicted from analysis of genomic sequences, are also listed.

The home page of REBASE is: http://rebase.neb.com/

This program derives recognition site and cleavage information from the "withrefm" file of an REBASE distribution. It creates three files in the EMBOSS data subdirectory REBASE. A pattern file, a reference file and a supplier file.

It will also (by default) produce an 'embossre.equ' file. This can be turned off by setting the -equivalences option to be false. This option calculates an 'embossre.equ' file using restriction enzyme prototypes in the "withrefm" file. The 'embossre.equ' file is a file of preferred isoschizomers. You may edit it to contain your available restriction enzymes.

The EMBOSS programs that find restriction cutting sites use the data files produced by this program and will not work without them.

Running this program may be the job of your system manager.

Usage

Here is a sample session with rebaseextract


% rebaseextract 
Extract data from REBASE
Full pathname of WITHREFM file: ../../data/withrefm
Full pathname of PROTO file: ../../data/proto

Go to the input files for this example
Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers:
  [-infile]            infile     Full pathname of WITHREFM file
  [-protofile]         infile     Full pathname of PROTO file

   Additional (Optional) qualifiers:
   -[no]equivalences   boolean    This option calculates an embossre.equ file
                                  using restriction enzyme prototypes in the
                                  withrefm file.

   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers: (none)
   General qualifiers:
   -auto                boolean    Turn off prompts
   -stdout              boolean    Write standard output
   -filter              boolean    Read standard input, write standard output
   -options             boolean    Prompt for standard and additional values
   -debug               boolean    Write debug output to program.dbg
   -verbose             boolean    Report some/full command line options
   -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning             boolean    Report warnings
   -error               boolean    Report errors
   -fatal               boolean    Report fatal errors
   -die                 boolean    Report deaths


Standard (Mandatory) qualifiers Allowed values Default
[-infile]
(Parameter 1)
Full pathname of WITHREFM file Input file Required
[-protofile]
(Parameter 2)
Full pathname of PROTO file Input file Required
Additional (Optional) qualifiers Allowed values Default
-[no]equivalences This option calculates an embossre.equ file using restriction enzyme prototypes in the withrefm file. Boolean value Yes/No Yes
Advanced (Unprompted) qualifiers Allowed values Default
(none)

Input file format

The input file must be the "withrefm" file of a REBASE distribution.

For example, the withrefm file for REBASE version 005 is at: ftp://ftp.neb.com/pub/rebase/withrefm.005

Input files for usage example

File: ../../data/withrefm

 
REBASE version 106                                              withrefm.106
 
    =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
    REBASE, The Restriction Enzyme Database   http://rebase.neb.com
    Copyright (c)  Dr. Richard J. Roberts, 2001.   All rights reserved.
    =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 
Rich Roberts                                                    May 31 2001
 

<ENZYME NAME>   Restriction enzyme name.
<ISOSCHIZOMERS> Other enzymes with this specificity.
<RECOGNITION SEQUENCE> 
                These are written from 5' to 3', only one strand being given.
                If the point of cleavage has been determined, the precise site
                is marked with ^.  For enzymes such as HgaI, MboII etc., which
                cleave away from their recognition sequence the cleavage sites
                are indicated in parentheses.  

                For example HgaI GACGC (5/10) indicates cleavage as follows:
                                5' GACGCNNNNN^      3'
                                3' CTGCGNNNNNNNNNN^ 5'

                In all cases the recognition sequences are oriented so that
                the cleavage sites lie on their 3' side.

                REBASE Recognition sequences representations use the standard 
                abbreviations (Eur. J. Biochem. 150: 1-5, 1985) to represent 
                ambiguity.
                                R = G or A
                                Y = C or T
                                M = A or C
                                K = G or T
                                S = G or C
                                W = A or T
                                B = not A (C or G or T)
                                D = not C (A or G or T)
                                H = not G (A or C or T)
                                V = not T (A or C or G)
                                N = A or C or G or T



                ENZYMES WITH UNUSUAL CLEAVAGE PROPERTIES:  

                Enzymes that cut on both sides of their recognition sequences,
                such as BcgI, Bsp24I, CjeI and CjePI, have 4 cleavage sites
                each instead of 2.



  [Part of this file has been deleted for brevity]

<6>S.A. Thompson
<7>N
<8>Morgan, R.D., Unpublished observations.
Morgan, R.D., Xu, Q., US Patent Office, 2001.
Xu, Q., Morgan, R., Blaser, M., Unpublished observations.

<1>HspAI
<2>HhaI,AspLEI,BcaI,BspLAI,BstHHI,CcoP95I,CfoI,Csp1470I,FnuDIII,Hin6I,Hin7I,HinGUI,HinP1I,HinS1I,HinS2I,Hpy99III,HpyF10I,HsoI,MnnIV,NgoEII,SciNI
<3>G^CGC
<4>
<5>Haemophilus species A
<6>S.K. Degtyarev
<7>I
<8>Rechkunova, N.I., Prikhod'ko, E.A., Shevchenko, A.V., Degtyarev, S.K., Unpublished observations.

<1>KpnI
<2>Acc65I,AhaB8I,Asp718I,BspJ106I,Eco149I,Esp19I,KpnK14I,MvsI,MvsAI,MvsBI,MvsCI,MvsDI,MvsEI,NmiI,Sau10I,SthI,SthAI,SthBI,SthCI,SthDI,SthEI,SthFI,SthGI,SthHI,SthJI,SthKI,SthLI,SthMI,SthNI,Uba76I,Uba85I,Uba86I,Uba87I,Uba1201I
<3>GGTAC^C
<4>4(6)
<5>Klebsiella pneumoniae OK8
<6>ATCC 49790
<7>ABCDEFGHIJKLMNOQRSTU
<8>Kiss, A., Finta, C., Venetianer, P., (1991) Nucleic Acids Res., vol. 19, pp. 3460.
Smith, D.I., Blattner, F.R., Davies, J., (1976) Nucleic Acids Res., vol. 3, pp. 343-353.
Tomassini, J., Roychoudhury, R., Wu, R., Roberts, R.J., (1978) Nucleic Acids Res., vol. 5, pp. 4055-4064.

<1>NotI
<2>CciNI,CspBI,MchAI
<3>GC^GGCCGC
<4>?(4)
<5>Nocardia otitidis-caviarum
<6>ATCC 14630
<7>ABCDEFGHJKLMNOQRSTU
<8>Borsetti, R., Wise, D., Qiang, B.-Q., Schildkraut, I., Unpublished observations.
Morgan, R.D., Unpublished observations.
Morgan, R.D., Benner, J.S., Claus, T.E., US Patent Office, 1994.
Qiang, B.-Q., Schildkraut, I., (1987) Methods Enzymol., vol. 155, pp. 15-21.

<1>TaqI
<2>CviSIII,EsaBC3I,HpyV,Hpy26II,HpyF14III,HpyF16I,HpyF23I,HpyF24I,HpyF26III,HpyF30I,HpyF35I,HpyF40II,HpyF42IV,HpyF45I,HpyF49I,HpyF52I,HpyF59III,HpyF62II,HpyF64I,HpyF65II,HpyF66IV,HpyF71I,HpyF73II,HpyJP26II,PpaAII,Taq20I,Tbr51I,TfiA3I,TfiTok4A2I,TfiTok6A1I,TflI,Tsc4aI,Tsp32I,Tsp32II,Tsp358I,Tsp505I,Tsp510I,TspAK13D21I,TspAK16D24I,TspNI,TspVi4AI,TspVil3I,Tth24I,TthHB8I,TthRQI
<3>T^CGA
<4>4(6)
<5>Thermus aquaticus YTI
<6>J.I. Harris
<7>ABCDEFGIJLMNOQRSTU
<8>Anton, B.P., Brooks, J.E., Unpublished observations.
Fomenkov, A., Xiao, J.-P., Dila, D., Raleigh, E., Xu, S.-Y., (1994) Nucleic Acids Res., vol. 22, pp. 2399-2403.
McClelland, M., (1981) Nucleic Acids Res., vol. 9, pp. 6795-6804.
Sato, S., Hutchison, C.A. III, Harris, J.I., (1977) Proc. Natl. Acad. Sci. U. S. A., vol. 74, pp. 542-546.
Zebala, J.A., (1993) Diss. Abstr., vol. 54, pp. 1394-1398.

File: ../../data/proto

 
REBASE version 305                                              proto.305
 
    =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
    REBASE, The Restriction Enzyme Database   http://rebase.neb.com
    Copyright (c)  Dr. Richard J. Roberts, 2003.   All rights reserved.
    =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 
Rich Roberts                                                    Apr 30 2003
 



	    TYPE II ENZYMES
	    ---------------

BsiYI                          CCNNNNN^NNGG
BsrI                           ACTGG (1/-1)
HaeIII                         GG^CC
HpaII                          C^CGG
Ksp632I                        CTCTTC (1/4)
MaeII                          A^CGT



	    TYPE I ENZYMES
	    ---------------

EcoAI                          GAGNNNNNNNGTCA
EcoBI                          TGANNNNNNNNTGCT
EcoDI                          TTANNNNNNNGTCY
EcoDR2                         TCANNNNNNGTCG
EcoDR3                         TCANNNNNNNATCG
EcoDXXI                        TCANNNNNNNRTTC
EcoEI                          GAGNNNNNNNATGC
EcoKI                          AACNNNNNNGTGC



	    TYPE III ENZYMES
	    ---------------

EcoPI                          AGACC
EcoP15I                        CAGCAG (25/27)
HinfIII                        CGAAT
StyLTI                         CAGAG

Output file format

Output files for usage example

File: REBASE


File: embossre.equ

Bsc4I BsiYI
Bse1I BsrI
BshI HaeIII
BsiSI HpaII
Bsu6I Ksp632I
HpyCH4IV MaeII

The output files are held in the REBASE subdirectory of the EMBOSS data directory. There are three:

rebaseextract will also (by default) produce an 'embossre.equ' file in the EMBOSS data directory. This can be turned off by setting the -equivalences option to be false. This option calculates an 'embossre.equ' file using restriction enzyme prototypes in the "withrefm" file. The 'embossre.equ' file is a file of preferred isoschizomers. You may edit it to contain your available restriction enzymes.

Data files

The "withrefm" file of an REBASE distribution is the input file for this program.

Notes

The home page of REBASE is: http://rebase.neb.com/

Running this program may be the job of your system manager.

The ready-made files produced by this program may already be available at the REBASE web site: http://rebase.neb.com/rebase/rebase.files.html or http://rebase.neb.com/rebase/rebase.f37.html

References

  1. Nucleic Acids Research 27: 312-313 (1999).

Warnings

The program will warn you if the input file is incorrectly formatted.

Diagnostic Error Messages

Exit status

It exits with status 0 unless an error is reported.

Known bugs

See also

Program nameDescription
aaindexextractExtract data from AAINDEX
cutgextractExtract data from CUTG
printsextractExtract data from PRINTS
prosextractBuilds the PROSITE motif database for patmatmotifs to search
tfextractExtract data from TRANSFAC

Author(s)

Alan Bleasby (ableasby © rfcgr.mrc.ac.uk)
MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK

History

Completed 12th April 1999

Target users

This program is intended to be used by administrators responsible for software and database installation and maintenance.

Comments