Entering edit mode
5.6 years ago
tothepoint
▴
940
I am trying to combine vcf file generated after haplotypecaller from different variety of breed. The total number of samples is ~100. I ran the command
gatk CombineGVCFs -R genome.fna --variant 1.vcf.gz --variant 2.vcf.gz.....variant97.vcf.gz -O combine.g.vcf.gz
but when I am trying to check the file it contain only 41 samples instead of 97 in vcf. If anyone know what I am doing wrong or any experience please share how to fix such situation.
I would check which samples are missing, does this give you a clue? Perhaps the index is missing from the missing vcf files. Check which samples have been included:
Which version of GATK are you using?
I am using gatk4.1.4.0
Have you checked the log file to make sure the command completed. Also I'll recommend
GenomicsDBImportinstead of CombineGVCFs.I checked the log file and there was no such issue. I already gave one more shot to CombineGVCFs but more curious to check using GenomicsDBImport. Thanks
Were you able to figure out what went wrong? I have a similar issue with CombineGVCFs. All my samples are literally merged into ONE. I have 243 individuals, but the final VCF shows all the variants as in one individual. I have no idea what is going on. I checked the log file. It looks fine. Reading in all the individual samples with no errors. Here is the command i used:
Any tips on how to fix this? I can't use the final vcf when there are no information on individuals. Thanks in advance
Only add an answer if you're answering the top level question. If you have a follow up or "I have this problem too" statement, use Comments instead. I've moved your post to a comment this time.
There was indexing issue from the missing combining files. I cross checked and indexed those file with
And re-run them with all file combined successfully.