Question: Using BedTools to look at unique bed intervals between two files
0
3 months ago by
a.rex170
a.rex170 wrote:

I have two bed files - they correspond to peak intervals for ATAC-seq of different samples.

I want to parse out the peak intervals unique to one file, as well as quantify the similarities between the two files. How can I do this with BedToools?

bedtools • 177 views
modified 3 months ago by Alex Reynolds27k • written 3 months ago by a.rex170

Hello a.rex ,

what's the problem with bedtools intersect?

fin swimmer

Look at bedtools intersect (similarities) and bedtools subtract (unique peaks).

1
3 months ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

Via BEDOPS:

``````\$ bedops --not-element-of 100% A.bed B.bed > elements_unique_to_A.bed
\$ bedops --not-element-of 100% B.bed A.bed > elements_unique_to_B.bed
``````

To measure similarity, you could calculate the Jaccard index based on cardinality of intersection and union of sets A and B.

First, calculate cardinalities:

``````\$ AnB=`bedops --element-of 1 A.bed B.bed | wc -l`
\$ BnA=`bedops --element-of 1 B.bed A.bed | wc -l`
\$ AuB=`bedops --everything A.bed B.bed | wc -l`
``````

Then calculate the index:

``````\$ calc "(\${AnB}+\${BnA})/(2*\${AuB))"
``````

You can and might likely want to adjust the overlap criteria passed to `--element-of` to make set membership more stringent than one base of overlap between peaks. For example, you might specify a minimum of `50%` overlap between a pair of peaks to call them "similar". Or you'd calculate some summary statistics about peaks and require a minimum of one SD of overlap (however many bases that is for your datasets), etc.

Content
Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.