I am analyzing miRNA-Seq data for differential expression analysis of miRNAs. First step in the process, I am performing is raw read quality filtering using FASTX-Toolkit to filter out reads with poor qualities using the following settings:
- the minimum quality score for each base = 20;
- the percent of bases that must have the minimum quality score ≤ 95%. ( version 0.0.14,http://hannonlab.cshl.edu/fastx_toolkit/index.html)
I used following command to perform quality filtering
fastq_quality_filter -i input -Q 33 -o output.fastq -v -q 20 -p 95
The raw human miRNA sequencing data was downloaded from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60292, which was clean of adapter sequences.
SampleId TotalReads TrimmedReads %OfGoodQualityReadsWithinTotalReads SRR1542714 1866654 962422 51.56 % SRR1542715 1842228 955859 51.89 % SRR1542716 2777542 1976509 71.16 % SRR1542717 1324705 318259 24.02 % SRR1542718 3085962 1830745 59.32 % SRR1542719 1937831 619794 31.98 %
Usually all these samples should produce >95% of good quality reads after quality filtering. This is a huge variation and seems like I am doing something wrong.
So my question is "Is there any problem in running fastq_quality_filter with this parameter settings?" If not what should be reason I am not able to reproduce the result?
Will be really appreciable if somebody can guide me