I've got c. 140 VCF files generated by the IonTorrent variant caller pipeline (sequenced using ampliSeq on the comprehensive cancer panel) that I want to process further. However, I'm not sure what filters to apply to the VCFs before grouping them into merged VCFs per group.
As far as I can tell, there are established GATK filtering thresholds, and thresholds for samtools mpileup called variants, but their INFO and FORMAT fields in their vcfs do not match up exactly with those provided by variants called from IonTorrent.
Is it ok to try and translate the GATK thresholds into those used by IonTorrent (as I say there's no exact match between some of the thresholds and values provided by IonTorrent) or is there a known set of hard thresholds somewhere that I could use as a starting point for my filtering?
Hey Graeme, which tags are present in the FORMAT and INFO fields? Usually one would filter by position read depth and QUAL score. If certain metrics are present, one can also filter by strand and read position bias, allelic fraction, etc.
I've got QUAL and AF in both FORMAT and INFO fields. I suspect that damaged DNA has low allelic fraction (there'll be lots of reference reads and not many variant reads) so a filter on the allelic fraction and on quality will remove most of the damaged DNA from the sample. Other filters I applied were based on those for GATK, but obviously transposed into the IonTorrent metadata language.
The raw data has a Ti/Tv ratio of >16, but filtering it as I did has given a value closer to 2.8, which, given the data is likely exon-heavy (it's not a WGS or WXS run, but was specifically amplified using primers for certain genes) is much better. Also, the distribution in allelic fraction and quality for C>T/G>A transitions once filtered is much closer to that for the opposite transitions T>C/A>G suggesting the filters I ran were stringent enough.