bp overlap between multiple beds
1
0
Entering edit mode
6.2 years ago
varsha619 ▴ 90

Hello, I was wondering what would be the best tool to identify and plot a venn of number of bp overlap between more than 3 bed files (similar to bedtools intersect -wo for 2 files). I have read all the related threads on multiInter and bedops. I also tried Homer mergePeaks --venn option and Intervene module that uses pyBedtools. Most of the tools seem to output only the number of overlapping features and not number of bp overlap. Please advise, thank you for your help.

bedtools multiinter venn • 2.4k views
ADD COMMENT
1
Entering edit mode
6.2 years ago

I think this should work (replace 123 with however many files you have as input):

$ N=123 
$ bedops --everything A.bed B.bed ... N.bed > union.bed
$ bedops --partition A.bed B.bed ... N.bed > partition.bed
$ bedmap --count --bases-uniq --delim '\t' partition.bed union.bed | awk -vN=$N '($1==N){ s += $2; } END { print s; }'

Test it, of course, but I think the set operations are correct. Since you mention Venns, the ($1==N) test can be adjusted for counting overlaps between N sets, N-1 sets, N-2, and so on, depending on what you're plotting.

Make sure your inputs are sorted with sort-bed before running this. This will be different from Unix sort and other tools.

ADD COMMENT
0
Entering edit mode

Hi Alex, I am a little confused with the output of bedmap. Is the 1st column number of overlaps and 2nd column bp overlap?

2   153

1   70

2   88

Is there a way to also see which files have these overlaps? Thank you!!

ADD REPLY
0
Entering edit mode

Each line in your output represents a partition. By partition, I mean a disjoint subset of the union of all your input BED files. Each disjoint subset is a unique genomic interval.

A partition does not overlap another partition. So you can do counting operations on each partition and know you're getting a unique answer.

Take a look at the --partition operator in the bedops documentation if you want a graphical explanation of what this means.

The first column is the number of files from which overlaps are derived. If you have three files, you could get overlaps from 1, 2 or 3 files. If you have four files, you could get overlaps from 1, 2, 3, or 4 files. And so on.

So the first column tells you how many files. The second column tells you how many unique bases overlap those n files. In your example, for instance, you have two files that overlap for 153+88=241 unique bases. You have one file that has 70 unique bases.

The job of awk is to sum those unique bases for N sets. If you start with three files, awk will sum the number of unique bases where there are overlaps between all three sets.

If you want to see which files provide the overlaps at each partition, add a unique ID value to each of A.bed, B.bed etc. This identifier should go into the ID column (fifth column). That identifier should be unique to that file; you could use A, B, etc. for instance.

Once you have added unique IDs to your inputs, you can add the --echo-map-id-uniq operator to the bedmap command to get those identifiers in a separate column.

ADD REPLY
0
Entering edit mode

Thanks for the explanation Alex, I understand the method now. I have 3 files A.bed B.bed C.bed and when I do the last step -

bedmap --count --bases-uniq --delim '\t' partition.bed union.bed

The resulting file had values of 4 and 5 in column 1, which from your explanation is the number of files with overlaps, which cannot be 4 or 5.

Also the awk command above gives me the error -

awk: cmd. line:1: fatal: cannot open file `($1==N){ s += $2; } END { print s; }' for reading (No such file or directory)

I did replace the N=123 with N=3. Please let me know if I am missing something here. Thank you.

ADD REPLY
0
Entering edit mode

Can you post your files somewhere? I'd be happy to take a look.

You might also check that your inputs are sorted per sort-bed, as non-sort-bed sorted inputs could give odd answers for the partition or union.

ADD REPLY
0
Entering edit mode

Thank you for your help Alex! I got the bedops and awk commands to work, as well as ended up using the 'intervene pairwise' module which gave me significance of overlap between bed files.

ADD REPLY
0
Entering edit mode

Glad it helped, and thanks for using BEDOPS!

ADD REPLY

Login before adding your answer.

Traffic: 2870 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6