bedtools intersect bed and vcf, coordinates problem
0
1
Entering edit mode
5.7 years ago
abascalfederico ★ 1.2k

I've found something unexpected, may be I just didn't understand how coordinates work.

I am intersecting a bed file with coordinates:

1         100       101


Against a vcf file, which has two sites in it:

1      100     etc etc
1      101     etc etc


The bed coordinates define a single site (100 in zero-based or 101 in one-based) However, intersectBed says that this bed line intersects with both vcf lines. I thought it should only intersect with 1:101 (vcf is 1-based). Isn't that incorrect? What am I missing?

Thanks!

bedtools bed vcf • 8.9k views
1
Entering edit mode

Hi, Could you tried please modify your bed file: 1 100 100 and then use syntax - bedtools intersect -a in1 -b in2 -wao

0
Entering edit mode
0
Entering edit mode

Thanks WouterDeCoster. However, that doesn't answer my question. According to what is explained in that post I should get only one intersection between my 0-based bed and my 1-based vcf.

1
Entering edit mode

I didn't state it would answer your question, just that it's useful ;-) But for the rest, I'm as puzzled as you by coordinates.

0
Entering edit mode

can you also post how you used intersectBed (command/script), that helps tracking down whats happening.

1
Entering edit mode

Ok, I know what's happening... Those sites are indels and bedtools, smartly (not as me), is taking that into account! Thanks for your help.

I run it like this:

intersectBed -a file.bed -b file.vcf


Version v2.22.0

Contents of "file.bed":

1   100 101


Contents of "file.vcf":

##fileformat=VCFv4.1
##contig=<ID=1,length=249250621,assembly=b37>
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1   100 rs144773400 TA  T   .   PASS    .
1   101 rs143255646 TA  T   .   PASS    .

2
Entering edit mode

As a side note also for further readers, remember that indels in a VCF file are stored including the last nucleotide before the indel. The position you see (1-based) is the position of that nucleotide. Sometimes this can be a tricky logical passage to make when this nucleotide lies exactly at the border of a bed region!