![]() |
trimseq |
Specifically, it:
It then optionally trims off poor quality regions from the end, using a threshold percentage of unwanted characters in a window which is moved along the sequence from the ends. The unwanted characters which are used are X's and N's (in nucleic sequences), optionally *'s, and optionally IUPAC ambiguity codes.
The program stops trimming the ends when the percentage of unwanted characters in the moving window drops below the threshold percentage.
Thus if the window size is set to 1 and the percentage threshold is 100, no further poor quality regions will be removed. If the window size is set to 5 and the percentage threshold is 40 then the sequence AAGCTNNNNATT will be trimmed to AAGCT, while AAGCTNATT or AAGCTNNNNATTT will not be trimmed as less than 40% of the last 5 characters are N's.
After trimming these poor quality regions, it will again then trim off any dangling gap characters from the ends .
% trimseq xyz.seq xyz_clean.seq -window 1 -percent 100
Tidy up the sequence ends, removing poor bits at the ends
% trimseq xyz.seq xyz_clean.seq -window 5 -percent 40
Tidy up the sequence ends, removing very poor bits at the ends
% trimseq xyz.seq xyz_clean.seq -window 20 -percent 80
Tidy up the sequence ends, removing even maginally poor bits at the ends
% trimseq xyz.seq xyz_clean.seq -window 20 -percent 10
Tidy up the sequence ends, removing poor bits including ambiguity codes
% trimseq xyz.seq xyz_clean.seq -window 20 -percent 50 -strict
Tidy up the sequence ends, removing asterisks from a protein end
% trimseq xyz.seq xyz_clean.seq -window 1 -percent 100 -star
Tidy up the sequence ends, removing poor bits at only the left end
% trimseq xyz.seq xyz_clean.seq -window 20 -percent 50 -noright
Standard (Mandatory) qualifiers: [-sequence] seqall Sequence database USA [-outseq] seqoutall Output sequence(s) USA Additional (Optional) qualifiers: -window integer This determines the size of the region that is considered when deciding whether the percentage of ambiguity is greater than the threshold. A value of 5 means that a region of 5 letters in the sequence is shifted along the sequence from the ends and trimming is done only if there is a greater or equal percentage of ambiguity than the threshold percentage. -percent float This is the threshold of the percentage ambiguity in the window required in order to trim a sequence. -strict boolean In nucleic sequences, trim off not only N's and X's, but also the nucleotide IUPAC ambiguity codes M, R, W, S, Y, K, V, H, D and B. In protein sequences, trim off not only X's but also B and Z. -star boolean In protein sequences, trim off not only X's, but also the *'s Advanced (Unprompted) qualifiers: -[no]left boolean Trim at the start -[no]right boolean Trim at the end Associated qualifiers: "-sequence" associated qualifiers -sbegin1 integer Start of each sequence to be used -send1 integer End of each sequence to be used -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -sformat1 string Input sequence format -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name "-outseq" associated qualifiers -osformat2 string Output seq format -osextension2 string File name extension -osname2 string Base file name -osdirectory2 string Output directory -osdbname2 string Database name to add -ossingle2 boolean Separate file for each entry -oufo2 string UFO features -offormat2 string Features format -ofname2 string Features file name -ofdirectory2 string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write standard output -filter boolean Read standard input, write standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report deaths |
Standard (Mandatory) qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
Sequence database USA | Readable sequence(s) | Required |
[-outseq] (Parameter 2) |
Output sequence(s) USA | Writeable sequence(s) | <sequence>.format |
Additional (Optional) qualifiers | Allowed values | Default | |
-window | This determines the size of the region that is considered when deciding whether the percentage of ambiguity is greater than the threshold. A value of 5 means that a region of 5 letters in the sequence is shifted along the sequence from the ends and trimming is done only if there is a greater or equal percentage of ambiguity than the threshold percentage. | Any integer value | 1 |
-percent | This is the threshold of the percentage ambiguity in the window required in order to trim a sequence. | Any numeric value | 100.0 |
-strict | In nucleic sequences, trim off not only N's and X's, but also the nucleotide IUPAC ambiguity codes M, R, W, S, Y, K, V, H, D and B. In protein sequences, trim off not only X's but also B and Z. | Boolean value Yes/No | No |
-star | In protein sequences, trim off not only X's, but also the *'s | Boolean value Yes/No | No |
Advanced (Unprompted) qualifiers | Allowed values | Default | |
-[no]left | Trim at the start | Boolean value Yes/No | Yes |
-[no]right | Trim at the end | Boolean value Yes/No | Yes |
Program name | Description |
---|---|
biosed | Replace or delete sequence sections |
cutseq | Removes a specified section from a sequence |
degapseq | Removes gap characters from sequences |
descseq | Alter the name or description of a sequence |
entret | Reads and writes (returns) flatfile entries |
extractfeat | Extract features from a sequence |
extractseq | Extract regions from a sequence |
listor | Writes a list file of the logical OR of two sets of sequences |
maskfeat | Mask off features of a sequence |
maskseq | Mask off regions of a sequence |
newseq | Type in a short new sequence |
noreturn | Removes carriage return from ASCII files |
notseq | Excludes a set of sequences and writes out the remaining ones |
nthseq | Writes one sequence from a multiple set of sequences |
pasteseq | Insert one sequence into another |
revseq | Reverse and complement a sequence |
seqret | Reads and writes (returns) sequences |
seqretsplit | Reads and writes (returns) sequences in individual files |
skipseq | Reads and writes (returns) sequences, skipping the first few |
splitter | Split a sequence into (overlapping) smaller sequences |
trimest | Trim poly-A tails off EST sequences |
union | Reads sequence fragments and builds one sequence |
vectorstrip | Strips out DNA between a pair of vector sequences |
yank | Reads a sequence range, appends the full USA to a list file |