Remove low quality base calls from reads
1
0
Entering edit mode
3.9 years ago
kspata ▴ 80

Hi All,

I was performing alignment of viral vector sample sequenced on MiSeq PE 250. I trimmed the raw reads using trim galore command as follows:

trim_galore -q 30 --length 50 -a AGATCGGAAGAGC --no_report_file --paired read1 read2 2> trim_galore.err


This should trim all bases with Phred Quality less than 30 and read length should be above 50 after trimming.

I aligned the files using Bowtie2 and performed variant analysis using Freebayes.

With freebayes I observed some spurious and some real variants (observed in previous project of the same kind).

516 CCCGGG>GGCGAC   1344X   27%


When I observe the alignment in IGV for spurious variants it shows some bases with quality less than 30. This is the alignment description from SAM file in which the false variant is observed.

M01610:114:000000000-C6GY3:1:1111:18615:7999    99  F8CDNAB-AAV 488 44  2S247M  =   594 358 CTCTGCGCGCTCGCTCGCTCACTGAGGCCGGGCGACCAAAGGTCGCCCGACGCCCGGGCTTTGCCCGGGCGGCCTCAGTGAGCGAGCGAGCGCGCAGAGAGGGAGTGGCCAACTCCATCACTAGGGGTTCCTGCGGCCTAAGCTTGGAACCATTGCCACCTTCAGGGGGAGGCTGCTGGTGAATATTAACCAAGATCACCCCAGTTACCGGAGGAGCAAACAGGGACTAAGTTCACACGCGTGGTACCG   CD#&CFCCCCCCGGGGGGGGGGHHHG8(&!!!!!!!&8==!!!!!!!!!!!!!!!!!!$==8&!!!!!!$&(8GGHHHHHGHHGGGGGGGGGGGGGGAGGGGGGGAFGGGGGGGGFFFFFFFFFFFF<@FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFE:BFFFFFFEFFFFFFFFFFFFFFFFFFFFFFFFFFFF.EFFFFFEFFFFFFFFFFFFFFFCADCFEFFFD   AS:i:375    XS:i:194    XN:i:0  XM:i:17 XO:i:0  XG:i:0  NM:i:17 MD:Z:28C0C2G0G5C0C2G0G2T2G0G2A0C5G0T2C0C180 YS:i:500    YT:Z:CP ZQ:Z:@@b]@@@@@@@@@@@@@@@@@@@@@@O^afffffffaOKKgeggffffffffffffffbJKPbgggfffca_O@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@


As you can see some of the bases at the beginning of the read have a quality score of ! which is interpreted as Phred quality 0

I can filter these variants using freebayes -q 30 but they will still appear in IGV as these bases are present in BAM files.

Why does trim galore not remove these low quality bases when I have specified a min phred quality 30 in my trim_galore command. How can I make these trimming more stringent? How can obtain reads with all bases having quality greater than 30?

I wish to avoid downstream processing of BAM files and VCF files to remove these low quality calls and spurious variants.

Thanks!!

Trimming Trim_Galore Alignment Variant Calls • 1.9k views
1
Entering edit mode
3.9 years ago
CD#&CFCCCCCCGGGGGGGGGGHHHG8(&!!!!!!!&8==!!!!!!!!!!!!!!!!!!
$==8&!!!!!!$&(8GGHHHHHGHHGGGGGGGGGGGGGGAGGGGGGGAFGGGGGGGGF
FFFFFFFFFFF<@FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBF
FFFFE:BFFFFFFEFFFFFFFFFFFFFFFFFFFFFFFFFFFF.EFFFFFEFFFFFFFF


Trim galore will trim for quality from the 3' end, and then do adapter trimming. Your ! are inside the read, not at the 3' end, so I'm assuming it does not remove them (see Adaptive quality and adapter trimming with Trim Galore!) And it makes sense too, what would happen if it would start removing bases within your read? How would you map that? That's why you want to keep the read in the BAM file and ideally have FreeBayes (or any variant caller) decide if the base quality where the identified variants are is high enough.

IGV just lets you visualize a BAM file, so I guess a solution might be to remove reads that have no high-quality basecalls, but I suspect that it would look weird in terms of coverage. My process is usually to run the variant calling pipeline (e.g. GATK, which has various base quality calibration steps and variant quality recalibration) and then only visualize interesting variants in IGV; the rest is noise. Sometimes even high quality GATK calls will look suspicious in IGV, for what it's worth.