Question: bedtools intersect "complete outer join"
0
gravatar for graeme.thorn
15 months ago by
graeme.thorn30
London, United Kingdom
graeme.thorn30 wrote:

I'm in the process of generating two BED files, and I want to use something like bedtools intersect to join the two together. However, I would like a behaviour like the -loj flag (which reports an empty B record, if no B overlaps a given feature in A), but to report an empty A record if no A feature overlaps B also. i.e. if I have A.bed file

chr1 1 10
chr1 21 30 
chr1 31 40

and B.bed file

chr1 11 20
chr1 21 30

then I would like an output similar to

chr1  1 10 .    -1 -1
.    -1 -1 chr1 11 20
chr1 21 30 chr1 21 30
chr1 31 40 .    -1 -1

so some kind of "complete outer join", being a union of a left outer join (-loj) flag with a "right outer join" (which BEDtools doesn't do)

I suspect I could do this by first doing the left outer joins

bedtools intersect -loj -wa -wb -a A.bed -b B.bed > loj1.bed
bedtools intersect -loj -wa -wb -a B.bed -b A.bed > loj2.bed

then reordering the columns in loj2.bed using "cut" (to put the record from the A file in the first set of columns), followed by

cat loj1.bed loj2_reordered.bed | sort -k1,1 -k2,2n > sorted.bed

merging the two files then

uniq sorted.bed > complete_outer_join.bed

removing duplicate lines, but I wonder if there's a quicker method for doing all this.

P.S. The features in the A and B bed files will be defined on the same coordinates, so the overlap of an A feature on a B feature would be the same as an overlap of a B feature on an A feature.

bedtools • 794 views
ADD COMMENTlink modified 15 months ago by Alex Reynolds27k • written 15 months ago by graeme.thorn30
3
gravatar for Alex Reynolds
15 months ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

Assumes use of BEDOPS bedops and bedmap, awk and sorted inputs with BEDOPS sort-bed:

$ sort-bed A.unsorted.bed > A.bed
$ sort-bed B.unsorted.bed > B.bed

Make a union of elements with bedops --everything and pipe to awk to get the uniques:

$ bedops --everything A.bed B.bed | awk '!a[$0]++' > unique-union.bed

Map sets to this union, using bedmap --indicator to get a binary 1/0 overlap flag, which is used to conditionally print the element where the flag is 1 (true), or a placeholder where the flag is 0 (false). Use paste to build a BED6-like pairing:

$ paste <(bedmap --indicator --echo unique-union.bed A.bed | awk -v FS='|' '{ if ($1==1) { printf("%s\n", $2); } else { printf(".\t-1\t-1\n"); } }') <(bedmap --indicator --echo unique-union.bed B.bed | awk -v FS='|' '{ if ($1==1) { printf("%s\n", $2); } else { printf(".\t-1\t-1\n"); } }')
chr1    1   10  .   -1  -1
.   -1  -1  chr1    11  20
chr1    21  30  chr1    21  30
chr1    31  40  .   -1  -1

Streams are pretty useful. The only intermediate file here is the unique-union.bed file, so this should run pretty fast.

Because we're also using streams, while this demo shows how to work with two files, this pipeline scales to n inputs pretty easily and quickly -- just add more paste, and you can glue more sets together.

ADD COMMENTlink modified 15 months ago • written 15 months ago by Alex Reynolds27k

And, of course, this would work if the BED files have more than three columns, just change the printf(".\t-1\t-1\n"); to include more fields.

ADD REPLYlink written 15 months ago by graeme.thorn30

Yes, but in the case where there are more than three columns, watch out for how you create the unique-union set. You may need to adjust the awk logic to use only the first three columns as a key, probably, depending on what you would decide to call "unique".

ADD REPLYlink written 15 months ago by Alex Reynolds27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1232 users visited in the last hour