I have 16 whole genome sequenced samples from two populations (8 for each population). My goal is detection of signature of selection and introgression. I performed read cleaning, mapping to reference, mark duplication. SNP calling was performed using HaplotypeCaller in GATK for each sample separately. Now my question: For downstream analysis (PCA, ADMIXTURE analysis and detecting signature of selection), do I need to use GenotypeGVCFs command in GATK for genotype joining? Or I can create one VCF file per sample separately (without GenotypeGVCFs) and merge them for downstream analysis after variant filtering?
Thanks in advance
Use GenotypeGVCFs file for post analysis.
Thanks tothepoint, Now, if i produce separate VCF file for each sample, how can i merge them? merging must be based on population?
Merge or combine vcf file? GATK combinegvcfs will do the job for you.
You should produce a gVCF for each sample (using haplotypecaller in GVCF mode) then (EDITED) combine them in order to run GenotypeGVCFs on all of them together.
(Edited to correct a mistake)
vdauwera please correct me if I am wrong. We can run GenotypeGVCFs after CombiningGVCFs. I did perform some analysis following the explanation in GATK page describing:
Oh I misread that as CombineVCFs (without the G), sorry. Yes you’re correct. I would recommend using the GenomicsDB method (that’s what I had in mind, realizing now I didn’t write it out — need coffee) rather than the basic combiner tool, but both are valid.
I edited my previous post to minimize confusion if someone else sees this thread.