I have several vcf files that I would like to merge into one large one. Each vcf file contains all variants from a different chromosome (in other words, there is one vcf file per chromosome). Within each of those vcf files, I have the exact same individuals in the same order.
I now want to merge these chromosome-wise vcf files using gatk's "GatherVcfs". The instructions say "Gathers multiple VCF files from a scatter operation into a single VCF file. Input files must be supplied in genomic order and must not have events at overlapping positions."
What do they mean by 'genomic order'? Since each of my vcf file concerns only one and a unique chromosome, I'm not sure how this applies in my case. Does it? Notably, in a test run where I merged five of my chromosome-wise vcf files, it seems to have worked (i.e., I didn't get an error), but I'm not sure if the program would throw me an error if I somehow violated the 'genomic order' requirement. Hence, I want to better understand what is actually meant by this. Also, how I can check the 'genomic order' in each of my vcf files to make sure it is consistent?
Thanks!
(Notably, I have alternatively tried bcftools to merge my vcf files, but it did not work since I got an error of a duplicated sample name, although there clearly isn't a duplicated sample name. It's possibly an erroneous recognition by bcftools because of my relatively long sample names which are often similar in the first ~30 digits).
did you read this ? Rename BAM file