bedtools intersect bed and vcf, coordinates problem
0
1
Entering edit mode
4.1 years ago
abascalfederico ★ 1.1k

I've found something unexpected, may be I just didn't understand how coordinates work.

I am intersecting a bed file with coordinates:

1         100       101

Against a vcf file, which has two sites in it:

1      100     etc etc
1      101     etc etc

The bed coordinates define a single site (100 in zero-based or 101 in one-based) However, intersectBed says that this bed line intersects with both vcf lines. I thought it should only intersect with 1:101 (vcf is 1-based). Isn't that incorrect? What am I missing?

Thanks!

bedtools bed vcf • 6.2k views
ADD COMMENT
1
Entering edit mode

Hi, Could you tried please modify your bed file: 1 100 100 and then use syntax - bedtools intersect -a in1 -b in2 -wao

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thanks WouterDeCoster. However, that doesn't answer my question. According to what is explained in that post I should get only one intersection between my 0-based bed and my 1-based vcf.

ADD REPLY
1
Entering edit mode

I didn't state it would answer your question, just that it's useful ;-) But for the rest, I'm as puzzled as you by coordinates.

ADD REPLY
0
Entering edit mode

can you also post how you used intersectBed (command/script), that helps tracking down whats happening.

ADD REPLY
1
Entering edit mode

Ok, I know what's happening... Those sites are indels and bedtools, smartly (not as me), is taking that into account! Thanks for your help.

I run it like this:

intersectBed -a file.bed -b file.vcf

Version v2.22.0

Contents of "file.bed":

1   100 101

Contents of "file.vcf":

##fileformat=VCFv4.1
##contig=<ID=1,length=249250621,assembly=b37>
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    
1   100 rs144773400 TA  T   .   PASS    .
1   101 rs143255646 TA  T   .   PASS    .
ADD REPLY
2
Entering edit mode

As a side note also for further readers, remember that indels in a VCF file are stored including the last nucleotide before the indel. The position you see (1-based) is the position of that nucleotide. Sometimes this can be a tricky logical passage to make when this nucleotide lies exactly at the border of a bed region!

ADD REPLY

Login before adding your answer.

Traffic: 2575 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6