Using BedTools to look at unique bed intervals between two files
1
1
Entering edit mode
5.7 years ago
a.rex ▴ 350

I have two bed files - they correspond to peak intervals for ATAC-seq of different samples.

I want to parse out the peak intervals unique to one file, as well as quantify the similarities between the two files. How can I do this with BedToools?

bedtools • 2.5k views
ADD COMMENT
0
Entering edit mode

Hello a.rex ,

what's the problem with bedtools intersect?

fin swimmer

ADD REPLY
0
Entering edit mode

Look at bedtools intersect (similarities) and bedtools subtract (unique peaks).

ADD REPLY
3
Entering edit mode
5.7 years ago

Via BEDOPS:

$ bedops --not-element-of 100% A.bed B.bed > elements_unique_to_A.bed
$ bedops --not-element-of 100% B.bed A.bed > elements_unique_to_B.bed

To measure similarity, you could calculate the Jaccard index based on cardinality of intersection and union of sets A and B.

First, calculate cardinalities:

$ AnB=`bedops --element-of 1 A.bed B.bed | wc -l`
$ BnA=`bedops --element-of 1 B.bed A.bed | wc -l`
$ AuB=`bedops --everything A.bed B.bed | wc -l`

Then calculate the index:

$ calc "(${AnB}+${BnA})/(2*${AuB))"

You can and might likely want to adjust the overlap criteria passed to --element-of to make set membership more stringent than one base of overlap between peaks. For example, you might specify a minimum of 50% overlap between a pair of peaks to call them "similar". Or you'd calculate some summary statistics about peaks and require a minimum of one SD of overlap (however many bases that is for your datasets), etc.

ADD COMMENT

Login before adding your answer.

Traffic: 1799 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6