Question: bp overlap between multiple beds
0
gravatar for varsha619
22 months ago by
varsha61980
varsha61980 wrote:

Hello, I was wondering what would be the best tool to identify and plot a venn of number of bp overlap between more than 3 bed files (similar to bedtools intersect -wo for 2 files). I have read all the related threads on multiInter and bedops. I also tried Homer mergePeaks --venn option and Intervene module that uses pyBedtools. Most of the tools seem to output only the number of overlapping features and not number of bp overlap. Please advise, thank you for your help.

multiinter venn bedtools • 798 views
ADD COMMENTlink modified 22 months ago by Alex Reynolds29k • written 22 months ago by varsha61980
1
gravatar for Alex Reynolds
22 months ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

I think this should work (replace 123 with however many files you have as input):

$ N=123 
$ bedops --everything A.bed B.bed ... N.bed > union.bed
$ bedops --partition A.bed B.bed ... N.bed > partition.bed
$ bedmap --count --bases-uniq --delim '\t' partition.bed union.bed | awk -vN=$N '($1==N){ s += $2; } END { print s; }'

Test it, of course, but I think the set operations are correct. Since you mention Venns, the ($1==N) test can be adjusted for counting overlaps between N sets, N-1 sets, N-2, and so on, depending on what you're plotting.

Make sure your inputs are sorted with sort-bed before running this. This will be different from Unix sort and other tools.

ADD COMMENTlink modified 22 months ago • written 22 months ago by Alex Reynolds29k

Hi Alex, I am a little confused with the output of bedmap. Is the 1st column number of overlaps and 2nd column bp overlap?

2   153

1   70

2   88

Is there a way to also see which files have these overlaps? Thank you!!

ADD REPLYlink modified 22 months ago • written 22 months ago by varsha61980

Each line in your output represents a partition. By partition, I mean a disjoint subset of the union of all your input BED files. Each disjoint subset is a unique genomic interval.

A partition does not overlap another partition. So you can do counting operations on each partition and know you're getting a unique answer.

Take a look at the --partition operator in the bedops documentation if you want a graphical explanation of what this means.

The first column is the number of files from which overlaps are derived. If you have three files, you could get overlaps from 1, 2 or 3 files. If you have four files, you could get overlaps from 1, 2, 3, or 4 files. And so on.

So the first column tells you how many files. The second column tells you how many unique bases overlap those n files. In your example, for instance, you have two files that overlap for 153+88=241 unique bases. You have one file that has 70 unique bases.

The job of awk is to sum those unique bases for N sets. If you start with three files, awk will sum the number of unique bases where there are overlaps between all three sets.

If you want to see which files provide the overlaps at each partition, add a unique ID value to each of A.bed, B.bed etc. This identifier should go into the ID column (fifth column). That identifier should be unique to that file; you could use A, B, etc. for instance.

Once you have added unique IDs to your inputs, you can add the --echo-map-id-uniq operator to the bedmap command to get those identifiers in a separate column.

ADD REPLYlink written 22 months ago by Alex Reynolds29k

Thanks for the explanation Alex, I understand the method now. I have 3 files A.bed B.bed C.bed and when I do the last step -

bedmap --count --bases-uniq --delim '\t' partition.bed union.bed

The resulting file had values of 4 and 5 in column 1, which from your explanation is the number of files with overlaps, which cannot be 4 or 5.

Also the awk command above gives me the error -

awk: cmd. line:1: fatal: cannot open file `($1==N){ s += $2; } END { print s; }' for reading (No such file or directory)

I did replace the N=123 with N=3. Please let me know if I am missing something here. Thank you.

ADD REPLYlink modified 22 months ago • written 22 months ago by varsha61980

Can you post your files somewhere? I'd be happy to take a look.

You might also check that your inputs are sorted per sort-bed, as non-sort-bed sorted inputs could give odd answers for the partition or union.

ADD REPLYlink modified 22 months ago • written 22 months ago by Alex Reynolds29k

Thank you for your help Alex! I got the bedops and awk commands to work, as well as ended up using the 'intervene pairwise' module which gave me significance of overlap between bed files.

ADD REPLYlink written 22 months ago by varsha61980

Glad it helped, and thanks for using BEDOPS!

ADD REPLYlink written 22 months ago by Alex Reynolds29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1408 users visited in the last hour