How To Convert The Coordinates When Turning A Vcf File Into A Bed File
Entering edit mode
7.8 years ago
win ▴ 890

Hello all, I have a gVCF from Illumina and i want to slice out data from it. The is a list of SNP, some of which are rs# and others are chromosomal coordinates. For e.g. chromosome 11 and position 89017961

I am planning to use Tabix and i wanted to know the correct format for the BED file for SNP becuase of 0 basing in BED.

Should it be 11 89017961-89017962 or 11 89017960-89017961

and should it also be chr11 instead of 11?

Thanks in advance.

bed snp vcf • 8.5k views
Entering edit mode

In addition to the great answers below you might also find the following tutorial useful: Cheat sheet for one-based vs zero-based coordinate systems

Entering edit mode
7.8 years ago

In vcf2bed, we convert from 1-based, closed [start, end] Variant Call Format v4 (VCF) to sorted, 0-based, half-open [start-1, end) extended BED data (cite).

For your example, a single-base variant at position 89017961 would map to 89017960-89017961, by default.

(Other custom variant process options are available to handle the coordinates in a different fashion; see the documentation for information about the --snvs, --insertions and --deletions command-line options.)

If you plan to integrate your data with other UCSC-formatted BED datasets, consistently using the prefix chr for chromosome names is a good idea, especially if you plan to integrate toolkits like BEDOPS, GROK or Bedtools to process BED datasets, but there are other approaches you can take, depending on the data or the lab or institution you're working with.

You can fix a lot of this stuff with standard UNIX piping. Building processing pipelines with UNIX pipes is a powerful option.

For example, convert to BED and look at the first few lines with head:

$ vcf2bed < foo.vcf | head

Then use awk or other tools of choice to modify fields with prefixes, remove capitalization, etc.

To demonstrate, you can prefix chromosome numbers with chr very easily:

$ vcf2bed < foo.vcf | awk '{ print "chr"$1"\t"$2"\t"$3; }' - > foo.fixed.bed
Entering edit mode
7.8 years ago
jackuser1979 ▴ 880

There is a tool kit for genome analysis especially for BED formats called BEDOPS. You can convert vcf to bed using these convertor tools available in BEDOPS.


Login before adding your answer.

Traffic: 2142 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6