Question: Bedtools intersect alternatives available
0
gravatar for Jeffin Rockey
4 months ago by
Jeffin Rockey1000
Karimannoor
Jeffin Rockey1000 wrote:

What are the alternative tools available which can do what bedtools intersect does?

bedtools • 318 views
ADD COMMENTlink modified 4 months ago by Alex Reynolds27k • written 4 months ago by Jeffin Rockey1000
1

and why wouldn't you use "bedtools intersect" ?

ADD REPLYlink written 4 months ago by Pierre Lindenbaum116k
1

There are 10+ methylome bedfiles which have the positions of methylated bases. They are obtained through bwa-meth followed by methyldackel followed by conversion to bed. Roughly each file has 80,000,000+ entries. I was trying to intersect these with gene features like exons. The count came as expected. The problem is that I have around 50,000 genes. Bedtools intersect is taking approximately 5 minutes say in total for say exon, intron, promoter of one gene. If I extrapolate for 50,000 genes at that scale it would require weeks to get the intersection completed. (Tried bedops also but counts were different than with bedtools. )

I have been an avid user of bedtools ever since. But in this case even with sorted beds, I could not achieve the necessary speed.

That is why I asked for other alternatives.

ADD REPLYlink modified 4 months ago • written 4 months ago by Jeffin Rockey1000
1

That's a better description about the problem you are trying to solve.

Some ideas:

  • Use tabix to index your bed file. Doing this you can have random access to given regions.
  • Think about splitting your regions you want to intersect with and use gnu parallel

fin swimmer

ADD REPLYlink written 4 months ago by finswimmer9.8k

I see . How about parallelizing things per exon ?

ADD REPLYlink written 4 months ago by Pierre Lindenbaum116k

When I tried to ‎ parallelize bedtools, they are in fact individually slowing down effectively nullifying the expected advantage. I checked in different servers[256 GB], but this behaviour is recurring, may be something to do with RAM.

ADD REPLYlink written 4 months ago by Jeffin Rockey1000

Please confirm whether you are using bedtools intersect -sorted.

ADD REPLYlink written 4 months ago by John Marshall1.5k

John,

I remember so. Let me cross check again. Shall confirm on this at the earliest.

ADD REPLYlink written 4 months ago by Jeffin Rockey1000

John,

Confirmed. With -sorted itself it is slow.

Jeffin

ADD REPLYlink modified 4 months ago • written 4 months ago by Jeffin Rockey1000

It depens on what you want to "intersect".

ADD REPLYlink written 4 months ago by finswimmer9.8k

But what is the purpose? For some of the task "grep -f" or "join" command can also be used.

ADD REPLYlink written 4 months ago by toralmanvar750

If this is whole-genome sequencing, I recommend first running a basic intersect (either with bedtools or bedops) with just the gene regions, keeping only those bases that overlap with genes for the more detailed annotation tasks looking at exons, introns, etc. Odds are you want to separate the based into gene-overlapping and non-gene-overlapping anyway. You may also want to consider splitting the intersect for either type of annotation, i.e., run a separate process with a bed file only containing exons or introns, respectively.

ADD REPLYlink written 4 months ago by Friederike2.5k
0
gravatar for Alex Reynolds
4 months ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

You would need sorted inputs (sorted per sort-bed, not sure what sortBed does), but for faster options, bedops --intersect and bedops --element-of do different kinds of intersections.

If you're counting overlaps of elements by class: bedmap --count and bedmap --faster --count can be useful.

You can also use the --chrom operator with BEDOPS tools to trivially parallelize work by chromosome via GNU Parallel or HPC job schedulers.

ADD COMMENTlink modified 4 months ago • written 4 months ago by Alex Reynolds27k

Thanks Alex. I shall try with the mentioned options and see if they can be used instead.

ADD REPLYlink written 4 months ago by Jeffin Rockey1000

If you're not working with nested elements, --faster can speed things up even more than the usual bedops speedup. See the docs for a more detailed explanation of what nested elements are and if they can be used here.

ADD REPLYlink modified 4 months ago • written 4 months ago by Alex Reynolds27k

Alex, Even without --faster bedops was significantly faster and that speedup itself was sufficient. But the difference in results of --element-of 1 from bedtools intersect is what held me back from using bedops.

ADD REPLYlink written 4 months ago by Jeffin Rockey1000

I don't know what overlap criteria that bedtools uses as a default, but --element-of 1 is one or more bases of overlap. More stringent overlap can be specified with more bases or by using percentage, i.e. --element-of 100% for full enclosure. Also check that inputs are sorted, and that inputs are provided in the correct order, i.e. bedops -e 1 A B will give a different answer from bedops -e 1 B A

ADD REPLYlink modified 4 months ago • written 4 months ago by Alex Reynolds27k

is there a difference in how bedtools and bedops interpret the intervals? (i.e., zero-based half open vs. 1-based etc.)

ADD REPLYlink written 4 months ago by Friederike2.5k

Bedops works correctly with half-open, 0-based indexing. Not sure what other tools do.

ADD REPLYlink written 4 months ago by Alex Reynolds27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1395 users visited in the last hour