Question: Converting a VCF with SNPs and indels to BED format
4
gravatar for gareth862
4.8 years ago by
gareth86290
United States
gareth86290 wrote:

Does anyone know of tools or have custom  scripts (aside from "vcf2bed") that are able to convert between a VCF containing both indels and SNPs into BED format? The tricky part is creating the correct BED regions to capture the indel variants, and I haven't been able to find anything on the internet thus far.

 

next-gen • 12k views
ADD COMMENTlink modified 24 months ago by anita0 • written 4.8 years ago by gareth86290
3

Hello gareth862!

It appears that your post has been cross-posted to another site: SEQanswers.

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by Devon Ryan89k

What result are you expecting?

ADD REPLYlink written 4.8 years ago by Alex Reynolds28k
1

We are trying to get the "affected positions" from the vcf. Our plan was to use bed tools to intersect our bed file with the vcf affected positions to produce a bed file of reference ranges

  • If the allele matches the ref then it wouldn't write a range for it
  • If deletion the range would include all deleted positions
  • If a variation the range would be only the changed bases
  • 1 vcf line can generate multiple bed lines

 

 

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by carterc10
7
gravatar for Alex Reynolds
4.8 years ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

The vcf2bed script will handle the last three issues:

  • Use --deletions to convert deletions to one set
  • Use --snvs to convert single-nucleotide variants to convert to a second set
  • Conversion of one VCF line will generate multiple BED lines if there are multiple alternate alleles

You can use bedops --everything to union the deletion and single-base variant sets:

$ vcf2bed --deletions < foo.vcf > foo_deletions.bed
$ vcf2bed --snvs < foo.vcf > foo_snvs.bed
$ bedops --everything foo_{deletions,snvs}.bed > foo.bed

Once you have your deletions and SNVs in one file, it's easy to deal with the first issue. The conversion script maps the REF and ALT fields of the VCF input to the sixth and seventh columns of the BED output. So you can filter out lines from the BED output where those columns are equivalent, using a simple awk statement:

$ awk '$6 != $7' foo.bed > foo_filtered.bed
ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by Alex Reynolds28k

Alex, I get following error with the above command.

Error: Could not allocate space for VCF INFO string

I have updated the bedops to the latest version (2.4.28) as you mentioned here. This update resolves the same error from convert2bed command. But the issue with vcf2bed remains. I would like to use vcf2bed over convert2bed since the deletions and the SNVs can be processed separately using vcf2bed.

Any suggestions?

ADD REPLYlink modified 19 months ago • written 19 months ago by Satyajeet Khare1.3k

vcf2bed is just a wrapper that calls convert2bed. If the issue shows up with vcf2bed and does not happen with convert2bed, I would suggest there's a problem with your installation. I would look in /usr/local/bin and make sure that vcf2bed is pointing to the current convert2bed.

ADD REPLYlink written 19 months ago by Alex Reynolds28k

Oh yes. I am running the latest version locally (through directory). Its probably trying to access the old version which is installed globally.

ADD REPLYlink written 19 months ago by Satyajeet Khare1.3k
1

The wrapper vcf2bed just uses whatever convert2bed it can find first in your environment PATH. You might try editing your PATH so that your local copy comes up first, or just upgrade the global copy.

Otherwise, you can use all the same functions in convert2bed as in vcf2bed, e.g.:

$ convert2bed --input=vcf --insertions < snps.vcf > insertions.bed

Etc.

You can run convert2bed --help-vcf for a listing of VCF-specific options, which should be exactly those offered by the wrapper script.

ADD REPLYlink written 19 months ago by Alex Reynolds28k

I used the same functions in the convert2bed as you mentioned. It works clean. Thanks!

ADD REPLYlink written 19 months ago by Satyajeet Khare1.3k
1

Great to hear! Feel free to follow up if you have any more questions or problems.

ADD REPLYlink written 19 months ago by Alex Reynolds28k

Sorry,

I have some .vcf files contain only snp (I guess because I have separate vcf files for somatic indels and structural variants)

I used this command to convert vcf to bed

vcf2bed --snvs < foo.vcf > foo_snvs.bed

My bed does not have any column named strand

How I can extract the stand column from vcf please?

Thanks

ADD REPLYlink written 10 weeks ago by F3.4k
1

I think, but am not 100% positive, that variant alleles are called against the forward strand-oriented reference base. So you could simply assume a "+".

Verifying this may be worth a separate question on biostars, and there are many people here who would likely be able to confirm this — or tell you that I'm wrong, which happens often. I'd double-check what I'm saying here.

If you must have a strand field in a BED file, and you can assume use of "+" (forward-oriented), and you don't mind losing the existing columns 6 and so on from what comes out of vcf2bed, then you could perhaps do something like the following:

$ cut -f1-5 foo_snvs.bed | awk -v FS="\t" -v OFS="\t" '{ print $1, $2, $3, $4, $5, "+"; }' > foo_snvs.bed6

You could do some more work with awk if you need to insert the strand column and preserve the existing columns. I'll leave that as an exercise.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by Alex Reynolds28k

Sorry

I don't know I got error

[fi1d18@cyan01 ~]$ module load bedtools/2.21.0
[fi1d18@cyan01 ~]$ $ cut -f1-5 foo_snvs.bed | awk -v FS="\t" -v OFS="\t" '{ print $1, $2, $3, $4, $5, "+"; }' > foo_snvs.bed6
bash: $: command not found
[fi1d18@cyan01 ~]$
ADD REPLYlink written 9 weeks ago by F3.4k
1

Please try removing the leading "$" from:

$ cut -f1-5 foo_snvs.bed | awk -v FS="\t" -v OFS="\t" '{ print $1, $2, $3, $4, $5, "+"; }' > foo_snvs.bed6
ADD REPLYlink written 5 weeks ago by jeb24710
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1659 users visited in the last hour