I usually first filtered the fastq read first, tossing out those reads with average phred score< say, 20. But recently I realized long read (100bp) may have the problem of decreasing quality towards the end of read, even if the average pass the criterion. Just like below:
@HWI-ST150_0130:3:64:18989:54871#0/2 AGACTCCCGGGTAGCAAGTACCTGGGACCACAGGTTTGTGCGACCATGCCTGACTAATTTTTGTATTTTTAGTAGTGATGGGGTTTCACTAGGTTGGCGAG +HWI-ST150_0130:3:64:18989:54871#0/2 ed\`dffffffbff^eeeabffffedafffcddbbe`\cea``cYddadbcdcYbRR][Yccc[_dddddP[ZMaBBBBBBBBBBBBBBBBBBBBBBBBBB
(here format is Illumina 1.5+) Just wondering is there any package to trim off these low-quality end? Also, after the trimming(say trim all bases encoded with "B", representing the lowest quality in Illumina 1.5+), each read will have variable length. Is this OK? What should I do to determined the edit distance parameter, which is dependent on the read length? (so calculate the average read length?)
Also, does BWA have such options?