Question: Pick highest quality fastq reads
0
gravatar for Biomonika (Noolean)
3.9 years ago by
State College, PA, USA
Biomonika (Noolean)3.1k wrote:

I am familiar with many trimmers and quality control software, but I don't think I came across software that would allow me to pick only n millions of the highest quality reads. I am not sure how to define "high quality here", probably mainly as having highest phred score along the whole length of the read. One solution would be to discard all the low quality data and then randomly pick as many reads as I would need from the rest of the data.

However, it would be more convenient to have a tool that sorts reads based on quality and lets me pick top n. I would use it for genomic DNA dataset where I currently have much higher coverage than I actually need and where many reads are very poor based on FASTQC report (failing per base sequence quality, per tile sequence quality and k-mer content).

quality fastq filter • 1.2k views
ADD COMMENTlink written 3.9 years ago by Biomonika (Noolean)3.1k

If there are real poor quality reads in the set then it may be best to use the solution you noted above (filter and then randomly sample). reformat.sh from BBMap can sample reads easily.

ADD REPLYlink written 3.9 years ago by genomax78k

So far I have used fastq_quality_filter from fastx_toolkit with parameters -q 20 -p 100 in case anyone was interested.

ADD REPLYlink written 3.8 years ago by Biomonika (Noolean)3.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1812 users visited in the last hour