Bedtools intersect not reporting all overlaps
3
1
Entering edit mode
3.9 years ago
jamie.pike ▴ 80

I'm having trouble with Bedtools Intersect. It is not reporting overlap when all of feature A is within feature B.

Command:

bedtools intersect -a File1.bed -b File2.gff -wao -f 0.90 -sorted > Out.file 

Out.file:

VMNF01000005.1  484754  484980  .   .   +   .   .   .   -1  -1  .   .   .   .   0
VMNF01000005.1  484754  484980  .   .   +   .   .   .   -1  -1  .   .   .   .   0
VMNF01000005.1  484754  484980  .   .   +   .   .   .   -1  -1  .   .   .   .   0
VMNF01000005.1  484754  484980  .   .   -   .   .   .   -1  -1  .   .   .   .   0
VMNF01000007.1  6294425 6294650 .   .   -   .   .   .   -1  -1  .   .   .   .   0
VMNF01000007.1  6294425 6294650 .   .   -   .   .   .   -1  -1  .   .   .   .   0
VMNF01000007.1  6294425 6294650 .   .   -   .   .   .   -1  -1  .   .   .   .   0
VMNF01000007.1  6294425 6294650 .   .   +   .   .   .   -1  -1  .   .   .   .   0
VMNF01000008.1  1441418 1441616 .   .   -   .   .   .   -1  -1  .   .   .   .   0

When I search manually, all of feature A sits within feature B.

Image of overlap when viewed in Artemis. Feature A (pink and yellow highlighted region within circle) sits within feature B (red highlighted region).

Can anybody please explain where I have gone wrong?

Bedtools version: bedtools v2.29.2

BED bedtools intersect GFF Artemis • 1.9k views
ADD COMMENT
1
Entering edit mode
3.9 years ago
jamie.pike ▴ 80

I will leave this for those who may be interested - the issue was that the scaffolds/contigs had a slight difference in the names between the .bed file and the .gff file. The .bed file included the suffix ".1". After adding ".1" to the scaffold/contig names in File2.gff, both bedtools intersect and bedmap worked.

E.g.

=> File1.bed <==
VMNF01000005.1 482754 486980 . . +
VMNF01000005.1 482754 486980 . . +

==> File2.gff <==
VMNF01000002 GenBank PROMOTER 1 34 . + 1 ID=FocTR4_00017231;Name=FocTR4_00017231
VMNF01000002 GenBank PROMOTER 4227 5039 . - 1 ID=FocTR4_00017232;Name=FocTR4_00017232
ADD COMMENT
2
Entering edit mode
3.9 years ago

1) Are elements in File2.gff overlapping? If so, you might have an issue where intervals are merged beforehand.

If all of an element in File1.bed is contained inside a merged interval from File2.gff, it is possible that the File1.bed element would not be contained entirely inside the intervals from File2.gff that make up that merge.

Consider the following cartoons. Here's element A:

A:  <-----------[   ]----------->

Now consider two elements from B, B1 and B2, which can be merged internally into something like B* for set operations:

B1: <---------[   ]------------->
B2: <------------[   ]---------->
B*: <---------[      ]---------->

A is contained entirely within B* and would get misreported, even though it is not contained entirely within B1 or B2.

Maybe try a different toolkit:

$ sort-bed File1.bed > File1.sorted.bed
$ gff2bed < File2.gff > File2.sorted.bed
$ bedmap  --echo --fraction-ref 1 File1.sorted.bed File2.sorted.bed > Out.file

This reports (--echo) elements in File1.sorted.bed which are contained entirely (--fraction-ref 1) within elements in File2.sorted.bed.

2) Are contig/chromosome names or name schemes identical in File1.bed and File2.gff? If you have an interval with Chr1 in one file, and an interval with chr1 in another, set operations between the two will treat those as intervals from different chromosomes.

ADD COMMENT
0
Entering edit mode

Thank you for your recommendation - I used Bedmap and had a similar result, I have now realised that there was a small difference in the naming of scaffolds/contigs in the gff compared to the bed file, and both bedtools intersect and Bedmap now work. I'll have a look at results from both and see how they compare.

Thank you once again.

ADD REPLY
0
Entering edit mode
3.9 years ago
Fatima ▴ 1000

I'm not sure, but I think it's because f is 0.9, which restricts the output to when they have at least 90% overlap, in your example I think they only have 10% overlap.

What's -wao? Have you tried -wb or -wa -wb?

ADD COMMENT
0
Entering edit mode

Is it not the case that 100% of feature A is overlapping with B? Or have I misunderstood how the % of overlap is reported? I have set the -f option to 0.1, 0.5, 0.9, and 1.0 with the same result. If I remove the -f option it will report on overlaps of even just 1bp, which is not enough. Ideally, I want at least 90% overlap.

The wao option: Write the original A and B entries plus the number of base pairs of overlap between the two features. However, A features w/o overlap are also reported with a NULL B feature and overlap = 0. Restricted by -f and -r.

I have also tried with the -wb and -wb option, but there is no output at all.

ADD REPLY

Login before adding your answer.

Traffic: 1440 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6