Entering edit mode
3.2 years ago
williamtmills
▴
20
I am trying to use multiIntersectBed to find overlap between multiple bed files and THEN use the overlap to filter the original bed files to, for example, only keep rows that are within the overlaps found in 2 or more libraries. In the example below, it is easy enough to filter the output to only keep rows where row 4 (num) >=2. How would I then remove every row from a.bed that does not fall within these overlapping intervals?
$ multiIntersectBed -header -i a.bed b.bed c.bed
chrom start end num list a.bed b.bed c.bed
chr1 6 8 1 1 1 0 0
chr1 8 12 2 1,3 1 0 1
chr1 12 15 3 1,2,3 1 1 1
chr1 15 20 2 1,2 1 1 0
chr1 20 22 1 2 0 1 0
chr1 22 30 2 1,2 1 1 0
chr1 30 32 1 2 0 1 0
chr1 32 34 1 3 0 0 1
I hope you realize that multiIntersectBed gives you the exact nucleotides that overlap between samples, not the ranges as a whole. In this situation here:
it would give you a value of 3 (present in all ranges) only for a single nucleotide (the first one of C that is present in all three. It would not return the intervals of A, B, and C. Just making sure you know that because I did not realize how multiIntersectBed works for quite a while. Is this really what you want?