Question

How To Convert The Coordinates When Turning A Vcf File Into A Bed File

1

Entering edit mode

11.6 years ago

win ▴ 990

Hello all, I have a gVCF from Illumina and i want to slice out data from it. The is a list of SNP, some of which are rs# and others are chromosomal coordinates. For e.g. chromosome 11 and position 89017961

I am planning to use Tabix and i wanted to know the correct format for the BED file for SNP becuase of 0 basing in BED.

Should it be 11 89017961-89017962 or 11 89017960-89017961

and should it also be chr11 instead of 11?

Thanks in advance.

bed snp vcf • 13k views

ADD COMMENT • link updated 11.6 years ago by jackuser1979 ▴ 890 • written 11.6 years ago by win ▴ 990

0

Entering edit mode

In addition to the great answers below you might also find the following tutorial useful: Cheat sheet for one-based vs zero-based coordinate systems

ADD REPLY • link 11.6 years ago by Obi Griffith 20k

score 6 · Answer 1 · 2013-12-02

In vcf2bed, we convert from 1-based, closed [start, end] Variant Call Format v4 (VCF) to sorted, 0-based, half-open [start-1, end) extended BED data (cite).

For your example, a single-base variant at position 89017961 would map to 89017960-89017961, by default.

(Other custom variant process options are available to handle the coordinates in a different fashion; see the documentation for information about the --snvs, --insertions and --deletions command-line options.)

If you plan to integrate your data with other UCSC-formatted BED datasets, consistently using the prefix chr for chromosome names is a good idea, especially if you plan to integrate toolkits like BEDOPS, GROK or Bedtools to process BED datasets, but there are other approaches you can take, depending on the data or the lab or institution you're working with.

You can fix a lot of this stuff with standard UNIX piping. Building processing pipelines with UNIX pipes is a powerful option.

For example, convert to BED and look at the first few lines with head:

$ vcf2bed < foo.vcf | head
...

Then use awk or other tools of choice to modify fields with prefixes, remove capitalization, etc.

To demonstrate, you can prefix chromosome numbers with chr very easily:

$ vcf2bed < foo.vcf | awk '{ print "chr"$1"\t"$2"\t"$3; }' - > foo.fixed.bed

score 1 · Answer 2 · 2013-12-02

1

Entering edit mode

11.6 years ago

jackuser1979 ▴ 890

There is a tool kit for genome analysis especially for BED formats called BEDOPS. You can convert vcf to bed using these convertor tools available in BEDOPS.

ADD COMMENT • link 11.6 years ago by jackuser1979 ▴ 890