Bcftools equivalent of vcf-merge R?
2
0
Entering edit mode
8.1 years ago
CuriousGuy ▴ 90

Hi everyone,

I need to merge two VCF files which contain different variants, something I can do with either vcf-merge (VCFtools) or BCFtools merge. By default, both assume that information is missing in positions listed in one file, but not in the other, which is represented as a dot (.) in the merger of both. However, vcf-merge has an option (-R) to use REF allele (0/0) instead of the default missing genotype. Unfortunately, I have not been able to find the same in bcftools, which is a shame because bcftools is much faster than VCFtools. And I am handling some pretty big VCF files...

That being said, is there any fast and easy way to do the same using bcftools? I put emphasis on the speed because this is precisely the reason why I am interested in bcftools in the first place.

Thanks for your help, it's much appreciated.

vcf bcftools variation genotype • 4.0k views
ADD COMMENT
3
Entering edit mode
8.1 years ago
CuriousGuy ▴ 90

I posted the same message on the official development repository and, shortly afterwards, Petr Danecek kindly added an experimental feature to BCFtools that achieves that same thing. I leave the link here for those who might be in the same situation as me:

OK, I added the option -0, --missing-to-ref, it is available from http://pd3.github.io/bcftools/

https://github.com/samtools/bcftools/issues/402

ADD COMMENT
0
Entering edit mode

nice move, great to know

ADD REPLY
0
Entering edit mode
8.1 years ago

sure a similar option to the vcf-merge -R embedded in bcftools merge would be much faster, but you may go for a pipe to bcftools plugin missing2ref, which will change missing genotypes for homozygous reference alleles as stated here.

you will need to have the plugins perfectly installed, otherwise you will have to go for a pipe to something similar to sed 's/\t\.\/\./\t0\/0/g'

ADD COMMENT
0
Entering edit mode

Thanks for the suggestion. However, I need to change only those missing genotypes that are added during the merge because that position is not present in one of the two VCF files. Both the pipe and the plugin replace all missing genotypes, even when they were genuinely missing in the original files and I want to keep them that way. It may come in handy in other situations, though, so thanks.

I have posted the solution I received from the developer, for those interested.

ADD REPLY

Login before adding your answer.

Traffic: 1889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6