I am trying to find a set of variants for a non-model organism that can be used as a known set (well-curated training/truth resources) in VQSR.
I have whole genome sequence bam files from 96 animals. I am thinking of doing hard filtering as advised here https://software.broadinstitute.org/gatk/documentation/article.php?id=3225 in broad institute website. I plan to perform bootstrapping method. I generated gvcf files for each bam file(96).
Now I am not sure whether to generated separate vcf files(96) for each bam file and do the bootstrapping separately and combine the final trained vcf files (96) into a single file to come up with the known truth set of variants. Or call CombineGVCFs on 96 gvcf files and generate a single gvcf file and a single vcf file out of it. Then use that single vcf file to bootstrap 96 bam files separately.
Any help is much appreciated.