Question: Bedtools intersect alternatives available
0
gravatar for Jeffin Rockey
8 weeks ago by
Jeffin Rockey980
Karimannoor
Jeffin Rockey980 wrote:

What are the alternative tools available which can do what bedtools intersect does?

bedtools • 226 views
ADD COMMENTlink modified 8 weeks ago by Alex Reynolds26k • written 8 weeks ago by Jeffin Rockey980
1

and why wouldn't you use "bedtools intersect" ?

ADD REPLYlink written 8 weeks ago by Pierre Lindenbaum115k
1

There are 10+ methylome bedfiles which have the positions of methylated bases. They are obtained through bwa-meth followed by methyldackel followed by conversion to bed. Roughly each file has 80,000,000+ entries. I was trying to intersect these with gene features like exons. The count came as expected. The problem is that I have around 50,000 genes. Bedtools intersect is taking approximately 5 minutes say in total for say exon, intron, promoter of one gene. If I extrapolate for 50,000 genes at that scale it would require weeks to get the intersection completed. (Tried bedops also but counts were different than with bedtools. )

I have been an avid user of bedtools ever since. But in this case even with sorted beds, I could not achieve the necessary speed.

That is why I asked for other alternatives.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Jeffin Rockey980
1

That's a better description about the problem you are trying to solve.

Some ideas:

  • Use tabix to index your bed file. Doing this you can have random access to given regions.
  • Think about splitting your regions you want to intersect with and use gnu parallel

fin swimmer

ADD REPLYlink written 8 weeks ago by finswimmer7.9k

I see . How about parallelizing things per exon ?

ADD REPLYlink written 8 weeks ago by Pierre Lindenbaum115k

When I tried to ‎ parallelize bedtools, they are in fact individually slowing down effectively nullifying the expected advantage. I checked in different servers[256 GB], but this behaviour is recurring, may be something to do with RAM.

ADD REPLYlink written 8 weeks ago by Jeffin Rockey980

Please confirm whether you are using bedtools intersect -sorted.

ADD REPLYlink written 8 weeks ago by John Marshall1.4k

John,

I remember so. Let me cross check again. Shall confirm on this at the earliest.

ADD REPLYlink written 8 weeks ago by Jeffin Rockey980

John,

Confirmed. With -sorted itself it is slow.

Jeffin

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Jeffin Rockey980

It depens on what you want to "intersect".

ADD REPLYlink written 8 weeks ago by finswimmer7.9k

But what is the purpose? For some of the task "grep -f" or "join" command can also be used.

ADD REPLYlink written 8 weeks ago by toralmanvar720

If this is whole-genome sequencing, I recommend first running a basic intersect (either with bedtools or bedops) with just the gene regions, keeping only those bases that overlap with genes for the more detailed annotation tasks looking at exons, introns, etc. Odds are you want to separate the based into gene-overlapping and non-gene-overlapping anyway. You may also want to consider splitting the intersect for either type of annotation, i.e., run a separate process with a bed file only containing exons or introns, respectively.

ADD REPLYlink written 8 weeks ago by Friederike2.3k
0
gravatar for Alex Reynolds
8 weeks ago by
Alex Reynolds26k
Seattle, WA USA
Alex Reynolds26k wrote:

You would need sorted inputs (sorted per sort-bed, not sure what sortBed does), but for faster options, bedops --intersect and bedops --element-of do different kinds of intersections.

If you're counting overlaps of elements by class: bedmap --count and bedmap --faster --count can be useful.

You can also use the --chrom operator with BEDOPS tools to trivially parallelize work by chromosome via GNU Parallel or HPC job schedulers.

ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by Alex Reynolds26k

Thanks Alex. I shall try with the mentioned options and see if they can be used instead.

ADD REPLYlink written 8 weeks ago by Jeffin Rockey980

If you're not working with nested elements, --faster can speed things up even more than the usual bedops speedup. See the docs for a more detailed explanation of what nested elements are and if they can be used here.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Alex Reynolds26k

Alex, Even without --faster bedops was significantly faster and that speedup itself was sufficient. But the difference in results of --element-of 1 from bedtools intersect is what held me back from using bedops.

ADD REPLYlink written 8 weeks ago by Jeffin Rockey980

I don't know what overlap criteria that bedtools uses as a default, but --element-of 1 is one or more bases of overlap. More stringent overlap can be specified with more bases or by using percentage, i.e. --element-of 100% for full enclosure. Also check that inputs are sorted, and that inputs are provided in the correct order, i.e. bedops -e 1 A B will give a different answer from bedops -e 1 B A

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Alex Reynolds26k

is there a difference in how bedtools and bedops interpret the intervals? (i.e., zero-based half open vs. 1-based etc.)

ADD REPLYlink written 8 weeks ago by Friederike2.3k

Bedops works correctly with half-open, 0-based indexing. Not sure what other tools do.

ADD REPLYlink written 8 weeks ago by Alex Reynolds26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1490 users visited in the last hour