We have sequencing one species transcriptome using Illumina GA II, reads are 50bp on average. Using a sliding window method, reads are trimmed according to their base qualities: eg, with a 4 base sliding window, if the average quality value for this window is lower than 20, then this streches of sequence are trimmed out from the original read.
This method generate several short trimmed reads, from 1bp to 50bp.
My question is that should i filter out reads that are too short, for example, exclude all reads shorter than 20bp in further analysis (mapping to reference genome)
If so, which length cutoff should be used?
Yes, i use FASTQC to view the data, but after trimming, all reads seems to be good (low quality bases have been trimmed out, leaving only high quality bases). The problem is that we should filtered out some reads that are too short, but I haver no idea about setting this length cutoff.