What is a tool to get the genome build of a VCF?
0
0
Entering edit mode
6 months ago
a5864557 • 0

It shouldn't be too hard to create one, but if one exists already that's even better. I need it to be automatable / non-web based (assume no relevant info exists in the header).

bcftools vcf • 550 views
ADD COMMENT
0
Entering edit mode

I don't think you can get genome build unless it's written in VCF header. Maybe you can guess according to some contig IDs.

ADD REPLY
0
Entering edit mode

First, I would never guess on an analysis. If there is no code and documentation available for a file then I'd never use it. But if you are absolutely forced to, you might get positions for common variants, dbSNP for example with high AF, for hg18, hg19, hg38, and then intersect these files with your VCF. The correct build should have the best overlap with the VCF on these sites.

ADD REPLY
0
Entering edit mode

@MatthewP ATpoint It is definitely possible to deterministically determine the build, by cross-referencing dbSNP (e.g) as ATpoint mentioned. If you do this for a large number of both b37 and b38 matches and you get 1000 matches on b38 and none on b37, we can assign the build to b38. Unless I'm missing something?

ADD REPLY

Login before adding your answer.

Traffic: 1791 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6