Question: Best trimming tool
gravatar for int11ap1
3.1 years ago by
int11ap1320 wrote:

What tool do you use for removing adapters and low-quality portions of reads and why?

trimming • 4.5k views
ADD COMMENTlink modified 3.1 years ago by Antonio R. Franco4.0k • written 3.1 years ago by int11ap1320
gravatar for Ian
3.1 years ago by
University of Manchester, UK
Ian5.4k wrote:

Our facility uses Trimmomatic as it performs adapter trimming and various other read filtering/trimming functions. It is pair-aware maintaining paired filtered reads whilst removing singletons. It is simple to use and reasonably fast.

ADD COMMENTlink modified 6 months ago by RamRS20k • written 3.1 years ago by Ian5.4k
gravatar for genomax
3.1 years ago by
United States
genomax64k wrote:

This thread won't be complete without bbduk and seal(which is part of BBMap suite). Written in pure java will work on PC/Mac/*nix.

ADD COMMENTlink modified 6 months ago by RamRS20k • written 3.1 years ago by genomax64k
gravatar for Juke-34
3.1 years ago by
Juke-342.0k wrote:

Best ? I don't know but there is lot of tools fairly comprehensive and efficient.

See this answer

It could be interesting to have a comparaison of the different tools in order to assess their main differences.

Table 5.1 here could be a good start.

ADD COMMENTlink modified 6 months ago by RamRS20k • written 3.1 years ago by Juke-342.0k
gravatar for dariober
3.1 years ago by
WCIP | Glasgow | UK
dariober9.9k wrote:

Personally I'm happy with cutadapt although I haven't explored other options, apart from trim_galore which is a wrapper around cutadapt. Things I like of cutadapt:

  • Fast enough, easy to use, flexible in how/what you want to trim and what to get back
  • Great documentation, well maintained.
  • Write to stdout so you can stream through bwa (or else) without writing massive files to disk
  • Recent releases: Read and write interleaved paired-end reads which can also be streamed to bwa

About quality trimming, these days quality is very high up to 150+ bases so I usually skip it altogether.

I superficially compared cutadapt with the trimmer that comes with the pipeline in Illumina/Basespace and the results where very similar, I think Basespace's trimmer was a little more aggressive, but essentially same results.

ADD COMMENTlink modified 6 months ago by RamRS20k • written 3.1 years ago by dariober9.9k
gravatar for Gabriel R.
3.1 years ago by
Gabriel R.2.6k
Center for Geogenetik Københavns Universitet
Gabriel R.2.6k wrote:

Our group at the Max Planck wrote a paper comparing the accuracy of various trimming algorithm. Our Bayesian trimmer leeHom ( outperformed other algorithms in terms of accuracy and very favorably in terms of speed:

It achieves merger of overlapping portions and detection of chimeric reads. You do not need cutoffs for % of matches etc and it eats fastq and BAM.

For the low quality, I would not recommend cutting reads at they will be harder to map but you remove low quality reads, I coded something a while back.

It uses BAM files and can filter on the exp. number of mismatches and sequence entropy.

Hope this helps!

ADD COMMENTlink modified 14 days ago • written 3.1 years ago by Gabriel R.2.6k
gravatar for Antonio R. Franco
3.1 years ago by
Spain. Universidad de Córdoba
Antonio R. Franco4.0k wrote:

You also need to take into account the possibility of run the trimmer in parallel (using several cores) and considering how fast it works actually

For example. In a 20Gb genomic fastq sequence (only one of the paired ends), a run with fastx-toolkit can take almost a day or even more to accomplish, whereas seqtk requires a dozen or minutes or so

ADD COMMENTlink modified 6 months ago by RamRS20k • written 3.1 years ago by Antonio R. Franco4.0k

It's also important to note what these tools can do, as per the original question. BBDuk does adapter-removal and quality-trimming in a single pass, faster than anything else; seqtk cannot perform adapter-removal.

Accuracy is also worth mentioning... since, in my opinion, it is generally more important than speed. BBDuk and seqtk have equal accuracy for quality-trimming, as they use the same algorithm. Anyway, that's a solved problem - the algorithm is optimal and cannot be improved. Everything else I've tested (which is everything commonly-used - trimmomatic, fastx, etc) is dramatically inferior, since it uses a non-optimal algorithm. Of course, quality-trimming is a dark art and often it's not even a good idea, but if you DO do it, you should do it correctly. Adapter-trimming, on the other hand, is ALWAYS a good idea (if done correctly).

BBDuk is the best adapter-trimming software available, by a huge margin. Let's put speed aside - it's the fastest, but that is never a good enough reason to use software unless more-accurate software is fundamentally too slow to use.

I have verified through extensive synthetic testing that BBDuk has a greater true-positive and lower false-positive rate of adapter-removal than anything else. In fact, those results were presented at AGBT a couple weeks ago. Maybe someone here attended?

In summary - I do not know of any useful read-trimming or read-filtering operation in which BBDuk is not the best tool in both accuracy and speed, aside from seqtk, which can exceed BBDuk's speed on single-ended reads.

ADD REPLYlink modified 6 months ago by RamRS20k • written 3.1 years ago by Brian Bushnell16k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1409 users visited in the last hour