I have MiSeq PE250 reads for a viral vector sample. I trimmed the raw reads with Trim Galore (length >= 200 and Q >= 30). I aligned these reads using both
BWA MEM and
Bowtie2 --local and found a
AAA -> TTT variant in both alignment files at different frequencies
71% respectively. I further analysed reads containing variant and found that the reads with variant had base quality call score less than 30 for TTT bases.
To check the effect of trimming on variant call and coverage, I further trimmed the trim galore trimmed reads to
remove all reads which had even a single base call quality less than 30. I repaired these reads using
bbmap repair.sh and aligned again using bowtie2. For this alignment the above variant was found at
<10% variant frequency. This was consistent with variant frequency reported with BWA-MEM. The coverage was also affected with almost
6.7% of the bases with
coverage less than 10X for the super trimmed reads vs
0.1% for trim galore trimmed only reads.
I used freebayes for variant calling with same parameters (Min Coverage 10, Min Alternate Fraction 0.01, Min Alternate Count 4) for both BWA and Bowtie2 aligned reads.
- Why was there such a large difference for variant frequency for Bowtie2 and BWA MEM?
- The coverage difference between trim galore trimmed reads and super trimmed reads is vast. Should I still use super trimmed reads (Trim Galore + Further trimming) as it gives correct variant frequency.
The fastqc result for the trim galore trimmed reads and super trimmed reads were as follows:
Super Trimmed Reads
The fastqc per base faultily fails for the super trimmed reads indicating a wrong Illumian Phred Score encoding version. Why did this happen?