Gatk, Variant Quality Score Recalibration, How To Work With Custom Truth And Training Datasets?
0
1
Entering edit mode
10.9 years ago
William ★ 5.3k

I have 10 strains of the same non-human species and I want to use VQSR on the raw single sample SNP calls. A subset of the 10 strains have been genotyped using a SNP array. How can I use that SNP array as a truth dataset? The snp array data is in some kind of a csv format. Can I just convert that format to the BED format and extract those positions from the single sample VCF, to a truth VCF, and supply that as a truth dataset to VQSR?

Can I use the same dataset as a training dataset, or does this need to be a different ( bigger and / or non overlapping?) dataset than the truth dataset? Or can I just for example take all the high quality (quality above 100) from the single sample raw SNP calls and supply this a a training set?

I also have reference call's in my raw single sample VCF's. Do I manually need to exclude them completely from the VQSR process?

I also posted the question on http://gatkforums.broadinstitute.org/discussion/39/variant-quality-score-recalibration-vqsr

More background info: VariantRecalibrator http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantrecalibration_VariantRecalibrator.html What VQSR training sets / arguments should I use for my specific project? http://www.broadinstitute.org/gatk/guide/article?id=1259

gatk • 4.2k views
ADD COMMENT
0
Entering edit mode

Nobody? I can't be the first person trying to use GATK VQSR without existing training data available?

ADD REPLY
0
Entering edit mode

It would make sense to use your SNP array data as truth, as for using it with the VQSR, I assume it would be relatively straightforward to convert your SNP data into a VCF-acceptable format.

ADD REPLY

Login before adding your answer.

Traffic: 2898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6