Hello everybody.
I have two main set of file to analyze, the end goal is SNP discovery and haplotype for posterior imputation.
One dataset is haploid the other is diploid. I will investigate each separatelly
For each dataset I have shallow sequenced DNA 26 individual trees.
My question is; there is any practical implication on doing the Variant calling for each individual then merge into a big vcf rather then merge all alignemnts in one big .bam (RG for each individual) and then do the variant call.
As far as I can see the differences in the individual call will be relativized by mergevcf tool, no?
Thanks in advance.
In my experience, I've always kept single bam files and performed joint genotyping with GATK for this task! I think it's safer for latter purposes.
So, obtaining a vcf for each sample and performing the joint genotyping with GATK. The thing is i'm not using GATK. I'm analysing the variants using DeepVariant. and I'm not sure if there is the same option. However I will keep in mind your solution.