Our facility uses Trimmomatic as it performs adapter trimming and various other read filtering/trimming functions. It is pair-aware maintaining paired filtered reads whilst removing singletons. It is simple to use and reasonably fast.
Best ? I don't know but there is lot of tools fairly comprehensive and efficient.
See this answer
It could be interesting to have a comparaison of the different tools in order to assess their main differences.
Table 5.1 here could be a good start.
Personally I'm happy with cutadapt although I haven't explored other options, apart from trim_galore which is a wrapper around cutadapt. Things I like of cutadapt:
- Fast enough, easy to use, flexible in how/what you want to trim and what to get back
- Great documentation, well maintained.
- Write to stdout so you can stream through bwa (or else) without writing massive files to disk
- Recent releases: Read and write interleaved paired-end reads which can also be streamed to bwa
About quality trimming, these days quality is very high up to 150+ bases so I usually skip it altogether.
I superficially compared cutadapt with the trimmer that comes with the pipeline in Illumina/Basespace and the results where very similar, I think Basespace's trimmer was a little more aggressive, but essentially same results.
Our group at the Max Planck wrote a paper comparing the accuracy of various trimming algorithm. Our Bayesian trimmer leeHom (http://grenaud.github.io/leeHom/) outperformed other algorithms in terms of accuracy and very favorably in terms of speed:
It achieves merger of overlapping portions and detection of chimeric reads. You do not need cutoffs for % of matches etc and it eats fastq and BAM.
For the low quality, I would not recommend cutting reads at they will be harder to map but you remove low quality reads, I coded something a while back.
It uses BAM files and can filter on the exp. number of mismatches and sequence entropy.
Hope this helps!
You also need to take into account the possibility of run the trimmer in parallel (using several cores) and considering how fast it works actually
For example. In a 20Gb genomic fastq sequence (only one of the paired ends), a run with fastx-toolkit can take almost a day or even more to accomplish, whereas seqtk requires a dozen or minutes or so