Question: How to keep information from both files when I intersect them?
1
gravatar for Sinji
2.5 years ago by
Sinji2.8k
UT Southwestern Medical Center
Sinji2.8k wrote:

I'm using bedops to intersect bed and bedgraph files and want to keep information in both files. Currently bedops only outputs the information on the first file that it is fed.

bedops --element-of $bed $bdg > $overlapped.bed

File A: BED

chr1 1000 2000 region_1
chr1 2000 3000 region_2

File B: BDG

chr1 1000 1500 100
chr1 2000 3000 200

Wanted output:

chr1 1000 2000 region_1 chr1 1000 1500 100
chr1 2000-3000 region_2 chr1 2000 3000 200

Any ideas?

bedops • 1.1k views
ADD COMMENTlink modified 2.5 years ago by ssv.bio180 • written 2.5 years ago by Sinji2.8k
4

Sure, use bedmap, instead:

$ bedmap --echo --echo-map --delim '\t' A.bed B.bed > answer.bed
  • The --echo operator prints out each element in set A.
  • The --echo-map operator prints any elements in set B ("map set") that overlap set A's element, by one or more bases.
  • The --delim '\t' operator separates set A elements and any overlapping elements by a tab character.

The bedmap tool maps or reports associations between sets of elements, based on how they overlap.

You can report the associations directly as elements, as shown above with --echo-map, or you can even report attributes or summaries of the associations, like statistical or score-based calculations (mean, median, etc.) using other operators (*).

The documentation shows examples of this for various BED inputs.

(*) Note: For score calculations, Bedgraph needs to be converted to BED5, but that's easy to do:

$ awk '{print $1"\t"$2"\t"$3"\t.\t"$4;}' in.bedgraph > out.bed

If you're just looking at mapped elements, you don't need to convert Bedgraph to BED.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Alex Reynolds28k

Answered everything, even anticipated the question about the scoring calculations I was going to have after running it. Thanks Alex!

ADD REPLYlink written 2.5 years ago by Sinji2.8k

You're welcome! If BEDOPS is useful to you, please consider adding a star to our Github page. This helps us gauge interest and make future updates more easily available.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Alex Reynolds28k

anyone know how to do this in perl?...just by chance this is very like something i need.

ADD REPLYlink written 2.5 years ago by YaGalbi1.4k
1

You could use system():

system("bedmap --echo --echo-map --delim '\t' A.bed B.bed > answer.bed") == 0 or die "could not run bedmap for some reason\n";
ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Alex Reynolds28k
1
gravatar for Eric Lim
2.5 years ago by
Eric Lim1.3k
Boston
Eric Lim1.3k wrote:

bedtools intersect -b a.bed -a b.bed -wb should do it, though you might need to reorder the columns for your wanted output.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Eric Lim1.3k
1

The re-ordering is kind of why I wanted to avoid bedtools and stick to bedops. I don't want to be that guy that gets a paper retracted due to incorrect data cleaning.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Sinji2.8k
0
gravatar for ssv.bio
2.5 years ago by
ssv.bio180
ssv.bio180 wrote:

Relying on example only, cbind In R should do the trick.

ADD COMMENTlink written 2.5 years ago by ssv.bio180

cbind assumes that the columns are in some sort of order, which I can't guarantee.

ADD REPLYlink written 2.5 years ago by Sinji2.8k

Then, bed tools is simplest, to use for such intersection. Intersect function in genomicranges package in R, also does this kind of intersection.

ADD REPLYlink written 2.5 years ago by ssv.bio180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1529 users visited in the last hour