GATK4 Variant calling with non-human model and no known SNP database
2
0
Entering edit mode
4.7 years ago
Lidia • 0

Hi everyone. I recently started working with DNA whole genome sequencing for variant calling with GATK 4.0.

I am working on a fish where I donĀ“t have a database of know SNPs nor of indels. I have a total of 394 individuals. This means that I have 394 WGS samples and I would like to use the GVCF workflow.

According, to what I have read, I need to create such lists (known SNPs and indels) with my own data. However, I have a couple of questions regarding the pipeline to achieve this.

1) In order to generate my list of SNPs and INDELs that will be provided as input for Base Quality Score Recalibration, should I use the Haplotypecaller in normal mode (where I get a .vcf file)? Or should I use the GVCF mode in this first round of the Haplotypecaller (where I get a g.vcf file)?

2) Since this first Haplotypecaller round will be done per sample, at the end I will have a total of 394 output files. Should I combine them all together and keep only the high quality variants, so that at the end I have only one file of SNPs and one of INDELs to use for all the 394 samples? Or should each sample be recalibrated with its own set of SNPs and INDELs?

Many thanks to all of you for your help and support.

Lidia

genome SNP • 1.4k views
ADD COMMENT
2
Entering edit mode
4.7 years ago
Ace ▴ 90

The GVCF mode in GATK is designed to do variant calling in groups. In theory you should get the same result doing a direct HC and doing the gvcf mode, it's just that the latter allows you to skip some time if you add samples in or want to use a different combination later.

You want to make a g.vcf for each sample, then combine them, then genotype them. You can then use your top variants in VQSR if you so desire. If you have subsets of samples that you think may behave differently, it may be worth repeating the combine>genotype>select pipeline separately with some of those subsets to put them in as different resources in VQSR so that the algorithm can be trained to recognize the necessary patterns. Otherwise, I think you're fine using just the group-calls.

ADD COMMENT
0
Entering edit mode
4.7 years ago
Lidia • 0

Thank you very much for your answer. I wil continue then with the Haplotyper caller in GVCF mode. Thanks!

ADD COMMENT

Login before adding your answer.

Traffic: 2002 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6