With GATK you used to have the 1000 + 1 genome variant calling problem, where you had to redo the whole expensive multi-sample variant calling (gatk haplotype caller) if you wanted to add a single genome.
Now GATK Haplotype caller can create intermediate gVCF files for each sample and do a much less expensive merging of the gVCF files to a multi-sample vcf.
My understanding is that freeBayes, platypus and samtools do not yet support gVCF files. However, you can work around this by calling variants from individuals files, merging the variant calls and then recalling the individual bam files at all sites called variant in at least one individual. See this excellent blog post
Well better than complete multi-sample variant calling but still very expensive I guess because of the centralized IO needed for recalling all variant sites in all the BAM files at the same time.
ADD REPLY
• link
updated 3.8 years ago by
Ram
45k
•
written 11.1 years ago by
William
★
5.4k
Well better than complete multi-sample variant calling but still very expensive I guess because of the centralized IO needed for recalling all variant sites in all the BAM files at the same time.