I am looking for programs that allow one to pre-process and filter large fastq files for various quality measures.
I know of the fastx toolkit but it seems a little long in the tooth (released in 2009) and the documentation of what it actually does seems to be lacking. Plus there are only one or two tools that would be useful for me, the rest seem to be some sort of plotting helpers.
There are publications out there such as this very recent one NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data in PLoS One 2012 but after reading it I am left scratching my head. This is a pure perl QC tool developed to run on Windows which means it has no internal core that could have been written in C to be fast. Makes me wonder of how this even got accepted.
I need some recommendations of tools that have been tried in practice and were proven to be fast and reliable. Ideally I would like to hear of the tool you use. Beside filtering by average quality, clipping and trimming back reads I would like to be able to detect various artifacts that the data might have, for example duplication, preferential enrichment of subsequences, polyadenylation etc.
Thanks for any input!