Question: Bedtools intersect not reporting all overlaps
1
gravatar for jamie.pike
9 weeks ago by
jamie.pike50
jamie.pike50 wrote:

I'm having trouble with Bedtools Intersect. It is not reporting overlap when all of feature A is within feature B.

Command:

bedtools intersect -a File1.bed -b File2.gff -wao -f 0.90 -sorted > Out.file

Out.file:

VMNF01000005.1  484754  484980  .   .   +   .   .   .   -1  -1  .   .   .   .   0
VMNF01000005.1  484754  484980  .   .   +   .   .   .   -1  -1  .   .   .   .   0
VMNF01000005.1  484754  484980  .   .   +   .   .   .   -1  -1  .   .   .   .   0
VMNF01000005.1  484754  484980  .   .   -   .   .   .   -1  -1  .   .   .   .   0
VMNF01000007.1  6294425 6294650 .   .   -   .   .   .   -1  -1  .   .   .   .   0
VMNF01000007.1  6294425 6294650 .   .   -   .   .   .   -1  -1  .   .   .   .   0
VMNF01000007.1  6294425 6294650 .   .   -   .   .   .   -1  -1  .   .   .   .   0
VMNF01000007.1  6294425 6294650 .   .   +   .   .   .   -1  -1  .   .   .   .   0
VMNF01000008.1  1441418 1441616 .   .   -   .   .   .   -1  -1  .   .   .   .   0

When I search manually, all of feature A sits within feature B.

Image of overlap when viewed in Artemis. Feature A (pink and yellow highlighted region within circle) sits within feature B (red highlighted region).

Can anybody please explain where I have gone wrong?

Bedtools version: bedtools v2.29.2

ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by jamie.pike50
1
gravatar for jamie.pike
9 weeks ago by
jamie.pike50
jamie.pike50 wrote:

I will leave this for those who may be interested - the issue was that the scaffolds/contigs had a slight difference in the names between the .bed file and the .gff file. The .bed file included the suffix ".1". After adding ".1" to the scaffold/contig names in File2.gff, both bedtools intersect and bedmap worked.

E.g.

=> File1.bed <==
VMNF01000005.1 482754 486980 . . +
VMNF01000005.1 482754 486980 . . +

==> File2.gff <==
VMNF01000002 GenBank PROMOTER 1 34 . + 1 ID=FocTR4_00017231;Name=FocTR4_00017231
VMNF01000002 GenBank PROMOTER 4227 5039 . - 1 ID=FocTR4_00017232;Name=FocTR4_00017232
ADD COMMENTlink written 9 weeks ago by jamie.pike50
2
gravatar for Alex Reynolds
9 weeks ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

1) Are elements in File2.gff overlapping? If so, you might have an issue where intervals are merged beforehand.

If all of an element in File1.bed is contained inside a merged interval from File2.gff, it is possible that the File1.bed element would not be contained entirely inside the intervals from File2.gff that make up that merge.

Consider the following cartoons. Here's element A:

A:  <-----------[   ]----------->

Now consider two elements from B, B1 and B2, which can be merged internally into something like B* for set operations:

B1: <---------[   ]------------->
B2: <------------[   ]---------->
B*: <---------[      ]---------->

A is contained entirely within B* and would get misreported, even though it is not contained entirely within B1 or B2.

Maybe try a different toolkit:

$ sort-bed File1.bed > File1.sorted.bed
$ gff2bed < File2.gff > File2.sorted.bed
$ bedmap  --echo --fraction-ref 1 File1.sorted.bed File2.sorted.bed > Out.file

This reports (--echo) elements in File1.sorted.bed which are contained entirely (--fraction-ref 1) within elements in File2.sorted.bed.

2) Are contig/chromosome names or name schemes identical in File1.bed and File2.gff? If you have an interval with Chr1 in one file, and an interval with chr1 in another, set operations between the two will treat those as intervals from different chromosomes.

ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by Alex Reynolds30k

Thank you for your recommendation - I used Bedmap and had a similar result, I have now realised that there was a small difference in the naming of scaffolds/contigs in the gff compared to the bed file, and both bedtools intersect and Bedmap now work. I'll have a look at results from both and see how they compare.

Thank you once again.

ADD REPLYlink written 9 weeks ago by jamie.pike50
0
gravatar for Fatima
9 weeks ago by
Fatima590
United states
Fatima590 wrote:

I'm not sure, but I think it's because f is 0.9, which restricts the output to when they have at least 90% overlap, in your example I think they only have 10% overlap.

What's -wao? Have you tried -wb or -wa -wb?

ADD COMMENTlink written 9 weeks ago by Fatima590

Is it not the case that 100% of feature A is overlapping with B? Or have I misunderstood how the % of overlap is reported? I have set the -f option to 0.1, 0.5, 0.9, and 1.0 with the same result. If I remove the -f option it will report on overlaps of even just 1bp, which is not enough. Ideally, I want at least 90% overlap.

The wao option: Write the original A and B entries plus the number of base pairs of overlap between the two features. However, A features w/o overlap are also reported with a NULL B feature and overlap = 0. Restricted by -f and -r.

I have also tried with the -wb and -wb option, but there is no output at all.

ADD REPLYlink written 9 weeks ago by jamie.pike50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1363 users visited in the last hour