Dear friends,
I am working with different breeds of pig and each contains multiple samples. Currently, I already finished standard pipeline from bwa to GATK and called HapotyCaller which resulted in vcf file for each samples. Now I am confused at merge step. Should I consider merging all vcf file altogether or first at breed level and than all the breed vcf together before running post analysis?
I am basically looking for variation in indigenous variety and I am confused at this stage. Any help or suggestion would be highly appreciable and grateful regarding the confusion.
you should call HaplotypCaller for all your bam at the same time to get a multisample vcf. Is there are too many BAMS, you can switch to the GVCF mode.
In the GVCF workflow used for scalable variant calling in DNA sequence data, HaplotypeCaller runs per-sample to generate an intermediate GVCF (not to be used in final analysis), which can then be used in GenotypeGVCFs for joint genotyping of multiple samples in a very efficient way. The GVCF workflow enables rapid incremental processing of samples as they roll off the sequencer, as well as scaling to very large cohort sizes (e.g. the 92K exomes of ExAC).
Thank you Pierre for the suggestion. I only have 70-80 samples and running in GVCF mode for each BAM file respectively. As there are different breed I am confused if after this step we need to merge every single vcf together or breed by breed.
Thank you Pierre for the suggestion. I only have 70-80 samples and running in GVCF mode for each BAM file respectively. As there are different breed I am confused if after this step we need to merge every single vcf together or breed by breed.