What is meant by 'genomic order' in gatk's GatherVcfs, and how do I check for it?
1
0
Entering edit mode
4 weeks ago
8armed • 0

I have several vcf files that I would like to merge into one large one. Each vcf file contains all variants from a different chromosome (in other words, there is one vcf file per chromosome). Within each of those vcf files, I have the exact same individuals in the same order.

I now want to merge these chromosome-wise vcf files using gatk's "GatherVcfs". The instructions say "Gathers multiple VCF files from a scatter operation into a single VCF file. Input files must be supplied in genomic order and must not have events at overlapping positions."

What do they mean by 'genomic order'? Since each of my vcf file concerns only one and a unique chromosome, I'm not sure how this applies in my case. Does it? Notably, in a test run where I merged five of my chromosome-wise vcf files, it seems to have worked (i.e., I didn't get an error), but I'm not sure if the program would throw me an error if I somehow violated the 'genomic order' requirement. Hence, I want to better understand what is actually meant by this. Also, how I can check the 'genomic order' in each of my vcf files to make sure it is consistent?

Thanks!

(Notably, I have alternatively tried bcftools to merge my vcf files, but it did not work since I got an error of a duplicated sample name, although there clearly isn't a duplicated sample name. It's possibly an erroneous recognition by bcftools because of my relatively long sample names which are often similar in the first ~30 digits).

gatk vcf merge • 386 views
ADD COMMENT
0
Entering edit mode

did you read this ? Rename BAM file

ADD REPLY
1
Entering edit mode
4 weeks ago

What do they mean by 'genomic order'?

genomic order is defined by the order of the chromosome in the /path/to/reference.dict file.

In the vcf header, there should be a set of lines starting with ##contig= in the very same order than in the reference.dict file

The chromosome in the VCF header should be orderer in the very same order than in the reference dict.file.

In GatherVcf , VCFs should be given with the chromosome ordered in the very same order than in the reference dict.file unless you use the option REORDER_INPUT_BY_FIRST_VARIANT = true https://gatk.broadinstitute.org/hc/en-us/articles/360056968552-GatherVcfs-Picard#--REORDER_INPUT_BY_FIRST_VARIANT

Notably, I have alternatively tried bcftools to merge my vcf files, but it did not work since I got an error of a duplicated sample name,

you want bcftools concat , not bcftools merge

ADD COMMENT

Login before adding your answer.

Traffic: 1630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6