We have sequencing one species transcriptome using Illumina GA II, reads are 50bp on average. Using a sliding window method, reads are trimmed according to their base qualities: eg, with a 4 base sliding window, if the average quality value for this window is lower than 20, then this streches of sequence are trimmed out from the original read.
This method generate several short trimmed reads, from 1bp to 50bp.
My question is that should i filter out reads that are too short, for example, exclude all reads shorter than 20bp in further analysis (mapping to reference genome)
If so, which length cutoff should be used?