Question: Removing Reads Which Have Adapter Sequences
0
gravatar for Varun Gupta
6.7 years ago by
Varun Gupta1.1k
United States
Varun Gupta1.1k wrote:

Hi

I would like to remove all those reads from my fastq file which has adapter sequence in it. Which tool or software or any unix command line options should be good for removing the reads

PS: i don't want to trim the adapters, want to remove that reads which have adapter seq from fastq file.

Seq of adapter: ACTAGTGTAGTCGTACTGATCT

Hope to hear from you soon

Regards

VARUN

adaptor • 3.8k views
ADD COMMENTlink modified 6.7 years ago by bioinfo740 • written 6.7 years ago by Varun Gupta1.1k
2
gravatar for Devon Ryan
6.7 years ago by
Devon Ryan93k
Freiburg, Germany
Devon Ryan93k wrote:

Assuming all of your reads are of the same length, you can use any of the existing read trimmers that allow a minimum read length option (e.g. trim_galore with the --length option). Then, just have the program reject any trimmed reads, since they'll be shorter than whatever the initial read length was. For example, if you have 100bp reads then running

trim_galore -a adapter --length 100 file.fastq

or something like that should do what you want. This has the benefit of being able to handle paired-end reads (presuming you want to filter out both of the pairs).

ADD COMMENTlink written 6.7 years ago by Devon Ryan93k

Hi This would also trim reads which don't have adapter sequence but have poor quality at the ends. I dont want to trim those reads. How to go about it

Varun

ADD REPLYlink written 6.7 years ago by Varun Gupta1.1k

Only if you want it to. You can set whatever quality trimming threshold you want. Try it with -q 0

ADD REPLYlink modified 6.7 years ago • written 6.7 years ago by Devon Ryan93k
0
gravatar for bioinfo
6.7 years ago by
bioinfo740
bioinfo740 wrote:

Have you tried the fastX toolkit. There is a function fastx_clipper which can be used to remove the adapter sequences. Here it is

$ fastx_clipper -h
usage: fastx_clipper [-h] [-a ADAPTER] [-D] [-l N] [-n] [-d N] [-c] [-C] [-o] [-v] [-z] [-i INFILE] [-o OUTFILE]

version 0.0.6
   [-h]         = This helpful help screen.
   [-a ADAPTER] = ADAPTER string. default is CCTTAAGG (dummy adapter).
   [-l N]       = discard sequences shorter than N nucleotides. default is 5.
   [-d N]       = Keep the adapter and N bases after it.
          (using '-d 0' is the same as not using '-d' at all. which is the default).
   [-c]         = Discard non-clipped sequences (i.e. - keep only sequences which contained the adapter).
   [-C]         = Discard clipped sequences (i.e. - keep only sequences which did not contained the adapter).
   [-k]         = Report Adapter-Only sequences.
   [-n]         = keep sequences with unknown (N) nucleotides. default is to discard such sequences.
   [-v]         = Verbose - report number of sequences.
          If [-o] is specified,  report will be printed to STDOUT.
          If [-o] is not specified (and output goes to STDOUT),
          report will be printed to STDERR.
   [-z]         = Compress output with GZIP.
   [-D]        = DEBUG output.
   [-i INFILE]  = FASTA/Q input file. default is STDIN.
   [-o OUTFILE] = FASTA/Q output file. default is STDOUT.
ADD COMMENTlink written 6.7 years ago by bioinfo740
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 725 users visited in the last hour