Question: Bedtools intersect not reporting all overlaps
1
gravatar for jamie.pike
4 months ago by
jamie.pike60
jamie.pike60 wrote:

I'm having trouble with Bedtools Intersect. It is not reporting overlap when all of feature A is within feature B.

Command:

bedtools intersect -a File1.bed -b File2.gff -wao -f 0.90 -sorted > Out.file

Out.file:

VMNF01000005.1  484754  484980  .   .   +   .   .   .   -1  -1  .   .   .   .   0
VMNF01000005.1  484754  484980  .   .   +   .   .   .   -1  -1  .   .   .   .   0
VMNF01000005.1  484754  484980  .   .   +   .   .   .   -1  -1  .   .   .   .   0
VMNF01000005.1  484754  484980  .   .   -   .   .   .   -1  -1  .   .   .   .   0
VMNF01000007.1  6294425 6294650 .   .   -   .   .   .   -1  -1  .   .   .   .   0
VMNF01000007.1  6294425 6294650 .   .   -   .   .   .   -1  -1  .   .   .   .   0
VMNF01000007.1  6294425 6294650 .   .   -   .   .   .   -1  -1  .   .   .   .   0
VMNF01000007.1  6294425 6294650 .   .   +   .   .   .   -1  -1  .   .   .   .   0
VMNF01000008.1  1441418 1441616 .   .   -   .   .   .   -1  -1  .   .   .   .   0

When I search manually, all of feature A sits within feature B.

Image of overlap when viewed in Artemis. Feature A (pink and yellow highlighted region within circle) sits within feature B (red highlighted region).

Can anybody please explain where I have gone wrong?

Bedtools version: bedtools v2.29.2

ADD COMMENTlink modified 4 months ago • written 4 months ago by jamie.pike60
1
gravatar for jamie.pike
4 months ago by
jamie.pike60
jamie.pike60 wrote:

I will leave this for those who may be interested - the issue was that the scaffolds/contigs had a slight difference in the names between the .bed file and the .gff file. The .bed file included the suffix ".1". After adding ".1" to the scaffold/contig names in File2.gff, both bedtools intersect and bedmap worked.

E.g.

=> File1.bed <==
VMNF01000005.1 482754 486980 . . +
VMNF01000005.1 482754 486980 . . +

==> File2.gff <==
VMNF01000002 GenBank PROMOTER 1 34 . + 1 ID=FocTR4_00017231;Name=FocTR4_00017231
VMNF01000002 GenBank PROMOTER 4227 5039 . - 1 ID=FocTR4_00017232;Name=FocTR4_00017232
ADD COMMENTlink written 4 months ago by jamie.pike60
2
gravatar for Alex Reynolds
4 months ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

1) Are elements in File2.gff overlapping? If so, you might have an issue where intervals are merged beforehand.

If all of an element in File1.bed is contained inside a merged interval from File2.gff, it is possible that the File1.bed element would not be contained entirely inside the intervals from File2.gff that make up that merge.

Consider the following cartoons. Here's element A:

A:  <-----------[   ]----------->

Now consider two elements from B, B1 and B2, which can be merged internally into something like B* for set operations:

B1: <---------[   ]------------->
B2: <------------[   ]---------->
B*: <---------[      ]---------->

A is contained entirely within B* and would get misreported, even though it is not contained entirely within B1 or B2.

Maybe try a different toolkit:

$ sort-bed File1.bed > File1.sorted.bed
$ gff2bed < File2.gff > File2.sorted.bed
$ bedmap  --echo --fraction-ref 1 File1.sorted.bed File2.sorted.bed > Out.file

This reports (--echo) elements in File1.sorted.bed which are contained entirely (--fraction-ref 1) within elements in File2.sorted.bed.

2) Are contig/chromosome names or name schemes identical in File1.bed and File2.gff? If you have an interval with Chr1 in one file, and an interval with chr1 in another, set operations between the two will treat those as intervals from different chromosomes.

ADD COMMENTlink modified 4 months ago • written 4 months ago by Alex Reynolds31k

Thank you for your recommendation - I used Bedmap and had a similar result, I have now realised that there was a small difference in the naming of scaffolds/contigs in the gff compared to the bed file, and both bedtools intersect and Bedmap now work. I'll have a look at results from both and see how they compare.

Thank you once again.

ADD REPLYlink written 4 months ago by jamie.pike60
0
gravatar for Fatima
4 months ago by
Fatima750
United states
Fatima750 wrote:

I'm not sure, but I think it's because f is 0.9, which restricts the output to when they have at least 90% overlap, in your example I think they only have 10% overlap.

What's -wao? Have you tried -wb or -wa -wb?

ADD COMMENTlink written 4 months ago by Fatima750

Is it not the case that 100% of feature A is overlapping with B? Or have I misunderstood how the % of overlap is reported? I have set the -f option to 0.1, 0.5, 0.9, and 1.0 with the same result. If I remove the -f option it will report on overlaps of even just 1bp, which is not enough. Ideally, I want at least 90% overlap.

The wao option: Write the original A and B entries plus the number of base pairs of overlap between the two features. However, A features w/o overlap are also reported with a NULL B feature and overlap = 0. Restricted by -f and -r.

I have also tried with the -wb and -wb option, but there is no output at all.

ADD REPLYlink written 4 months ago by jamie.pike60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1995 users visited in the last hour