Question: Filtering IonTorrent variant caller VCFs
0
gravatar for graeme.thorn
4 months ago by
graeme.thorn40
London, United Kingdom
graeme.thorn40 wrote:

I've got c. 140 VCF files generated by the IonTorrent variant caller pipeline (sequenced using ampliSeq on the comprehensive cancer panel) that I want to process further. However, I'm not sure what filters to apply to the VCFs before grouping them into merged VCFs per group.

As far as I can tell, there are established GATK filtering thresholds, and thresholds for samtools mpileup called variants, but their INFO and FORMAT fields in their vcfs do not match up exactly with those provided by variants called from IonTorrent.

Is it ok to try and translate the GATK thresholds into those used by IonTorrent (as I say there's no exact match between some of the thresholds and values provided by IonTorrent) or is there a known set of hard thresholds somewhere that I could use as a starting point for my filtering?

variants iontorrent vcf • 278 views
ADD COMMENTlink modified 3 months ago • written 4 months ago by graeme.thorn40

Hey Graeme, which tags are present in the FORMAT and INFO fields? Usually one would filter by position read depth and QUAL score. If certain metrics are present, one can also filter by strand and read position bias, allelic fraction, etc.

ADD REPLYlink written 3 months ago by Kevin Blighe43k

I've got QUAL and AF in both FORMAT and INFO fields. I suspect that damaged DNA has low allelic fraction (there'll be lots of reference reads and not many variant reads) so a filter on the allelic fraction and on quality will remove most of the damaged DNA from the sample. Other filters I applied were based on those for GATK, but obviously transposed into the IonTorrent metadata language.

The raw data has a Ti/Tv ratio of >16, but filtering it as I did has given a value closer to 2.8, which, given the data is likely exon-heavy (it's not a WGS or WXS run, but was specifically amplified using primers for certain genes) is much better. Also, the distribution in allelic fraction and quality for C>T/G>A transitions once filtered is much closer to that for the opposite transitions T>C/A>G suggesting the filters I ran were stringent enough.

ADD REPLYlink written 3 months ago by graeme.thorn40
1
gravatar for graeme.thorn
3 months ago by
graeme.thorn40
London, United Kingdom
graeme.thorn40 wrote:

For future reference, what I did was to filter the data using the GATK guidelines on strand bias and read depth and on allelic fraction and quality. DNA damaged during the fixing process is likely randomly distributed, so will have low allelic fraction and low quality as estimated by the variant caller.

Filtering on quality and allelic fraction (along with everything else) proved stringent enough: the C>T/G>A transitions are now about as frequent as the T>C/A>G transitions in the filtered dataset, and the overall transition/transversion ratio has decreased from above 16 (in the unfiltered database) down to about 2.8.

ADD COMMENTlink written 3 months ago by graeme.thorn40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1641 users visited in the last hour