Question: Define unique intervals from a set of peaks
0
gravatar for fusion.slope
2.7 years ago by
fusion.slope200
fusion.slope200 wrote:

Hello,

i have called peaks from 6 individual bam files and now I would like to merge the bed files in a way in which i will have uniq intervals. (where each uniq interval is present in all the 6 files).

How is it possible to do that? Is there a way with bedtools? I read the tutorial but there is nothing about this. Any idea would be really appreciated.

Cheers

ADD COMMENTlink modified 2.7 years ago by Alex Reynolds29k • written 2.7 years ago by fusion.slope200

Unique would be a set element present in only one of N inputs. Non-unique would mean an element present or overlapping in two or more of N inputs. Which do you mean?

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Alex Reynolds29k

An element overlapping 6 out of 6 Inputs.

ADD REPLYlink written 2.7 years ago by fusion.slope200

The simplest solution would be multiple iterations of BEDTools 'intersect' with each of your six BAMs (intersect bam1 and bam2, intersect that output with bam3, etc).

BTW, you should use the 'Add Comment' button instead of 'Answer' when responding.

ADD REPLYlink written 2.7 years ago by harold.smith.tarheel4.4k
1

Consider the following example:

example of overlaps

A intersects B, and B intersects C. However, A does not intersect C.

If you want intervals overlapping across all sets, you can do pairwise comparison tests between all the input sets, which does not scale well and requires storing intermediate files.

Or you can use the faster approach I describe, which uses sorted BED files with bedops and bedmap, which does not require pairwise comparisons or intermediate files.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Alex Reynolds29k
2
gravatar for Alex Reynolds
2.7 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

Here is an efficient solution with BEDOPS, which avoids approaches with other tools that require M^N pairwise comparisons, for M elements and N sets.

Say you have N files (your specific example uses 6 files, but this shows the generic solution for N files).

You appear to want all elements from the N files, which overlap at least one element from each of all other N-1 sets.

First, convert any non-BED files to sorted BED with the appropriate conversion tools, e.g.: bam2bed from BEDOPS:

$ for inFn in `find . -name "*.bam"`; do echo $inFn; outFn=$inFn.bed; bam2bed < $infn > $outFn; done

Second, label each BED file to note where it is coming from:

$ for inFn in `find . -name "*.bam.bed"`; do echo $inFn; outFn=${inFn%%.*}.labeled.bed; prefix=${inFn%%.*}; echo ${prefix} | cut -f1-3 ${inFn} - > $outFn; done

Set a shell variable with the number N set to your overlap threshold, e.g.:

$ export N=6

Then do the set operations on the labeled files with BEDOPS bedops and bedmap:

$ bedops --everything file1.labeled.bed file2.labeled.bed ... fileN.labeled.bed \
    | bedmap --echo --echo-map-id-uniq --delim '\t' - \
    | cut -f1-3,5 - \
    | awk -vN=${N} '{ n = split($4, a, ";"); if (n==N) { print $0; } }' \
    > intervals_overlapping_all_N_inputs.bed

The file intervals_overlapping_all_N_inputs.bed contains a list of intervals that overlap all N sets.

This may be enough, but perhaps you want more detail. Now you can go back and filter all the converted BAM files for those elements:

$ for inFn in `find . -name "*.bam.bed"`; do echo $inFn; $outFn=${inFn%%.*}.filtered.bed; bedops --element-of -1 $inFn intervals_overlapping_all_N_inputs.bed > $outFn; done

Each *.filtered.bed will contain BED-formatted reads specific to the parent BAM file, which overlap all N sets.

This looks like a lot of work, but this will run much faster than a pairwise approach with other tools.

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Alex Reynolds29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2459 users visited in the last hour