Question: How Can I Compare And Merge Bed Files
0
gravatar for lyfsa
6.5 years ago by
lyfsa30
lyfsa30 wrote:

I have three bed files with chrNo, start, end position and type. I need to compare each chrNo, start and end position of one file with 2 other files and write the common one in a new file. Can any one suggest how can I do this efficiently? I wrote the simple perl script, but as the file is huge, it is taking a lot of time, thus is not feasible. Thanks in advance

Example files:

file1.bed:

1 20 30

1 100 120

1 200 300

file2.bed:

1 2 5

1 25 34

1 200 300

file3.bed:

1 30 33

1 200 300

1 500 600

common.bed

1 30 34 --> coordinates with overlapping 5bp is considered as same but outermost coordinates of the 3 is taken in common file

1 200 300

bedtools bed • 14k views
ADD COMMENTlink modified 5.7 years ago by beary.pooh10 • written 6.5 years ago by lyfsa30

It'd be nice if you change the tag to something appropriate for your post, like bedtools, mergebed.

ADD REPLYlink written 6.5 years ago by Arun2.3k

the above given example files are bed files with chrNo, start and end position with 3lines in each file...I did not know how to post a separate example box in this post..

ADD REPLYlink modified 6.5 years ago • written 6.5 years ago by lyfsa30
3
gravatar for Sukhdeep Singh
6.5 years ago by
Sukhdeep Singh9.5k
Netherlands
Sukhdeep Singh9.5k wrote:

Don't merge them, you need multiIntersectBed, a tool part of BedTools suite used to find common overlap between more than two files.

Check this post for usage examples. I have asked Aaron, about specifying the minimum threshold of overlap to call it as an overlap while using multiIntersectBed. For your second question (which you have actually put as an answer), you can specify the distance between the reads/peaks for the merging to happed using -d parameter. From the manual

Controlling how close two features must be in order to merge (-d) By default, only overlapping or book-ended features are combined into a new feature. However, one can force mergeBed to combine more distant features with the –d option. For example, were one to set –d to 1000, any features that overlap or are within 1000 base pairs of one another will be combined.

For example:
$ cat A.bed
chr1 100 200
chr1 501 1000

$ mergeBed –i A.bed
chr1 100 200
chr1 501 1000

$ mergeBed –i A.bed –d 1000
chr1 100 200 1000

Cheers

ADD COMMENTlink modified 6.5 years ago • written 6.5 years ago by Sukhdeep Singh9.5k

Hi! I tried to merge with multiIntersectBed, but the result I get is not what I want. I looked at the usage link you have posted. In there you have also suggested an approach, intersectBed -a 2 -b 3 > 23 intersectBed -a 1 -b 3 > 13 intersectBed -a 1 -b 2 > 12

intersectBed -a 1 -b 23 -f 0.50|sort > 231 intersectBed -a 2 -b 13 -f 0.50|sort > 132 intersectBed -a 3 -b 12 -f 0.50|sort > 12_3

comm -1 -2 231 132 > test comm -1 -2 test 1_3 > final result

Will this work, if I have to get the common start and end position found in all three files considering the overlapping of 50bp?

ADD REPLYlink written 6.5 years ago by lyfsa30

In your question, when you mean common between all files, you mean exact chr, start and end positions between the 3 files? Can you edit your post with 2 files with an example case? If so, then its relatively easier to do this using unix commands...

ADD REPLYlink written 6.5 years ago by Arun2.3k

I have now edited my post with examples. I didn't know how to use the separate box for example...so my post is not that clear...my files are bed file with chrNo, start and end postion with 3 lines in each file...hope you will get it :)

ADD REPLYlink modified 6.5 years ago • written 6.5 years ago by lyfsa30

This command will give you overlap b/w all three files multiIntersectBed -i a.bed b.bed c.bed | awk '$4==3' but for the first overlap you should use chipPeakAnno, check the maxgap parameter.

Cheers

ADD REPLYlink written 6.5 years ago by Sukhdeep Singh9.5k
2
gravatar for Arun
6.5 years ago by
Arun2.3k
Germany
Arun2.3k wrote:

use mergeBed from bedtools like this:

cat file1 file2 file3 | mergeBed -i stdin
ADD COMMENTlink written 6.5 years ago by Arun2.3k
1
gravatar for beary.pooh
5.7 years ago by
beary.pooh10
beary.pooh10 wrote:

bedtools v 2.17.0 provides multiIntersectBed

find it in /bin/

just type multiIntersectBed -i [file1] [file2] ...

ps. -i should be followed by file names "*.bam" does not work

ADD COMMENTlink written 5.7 years ago by beary.pooh10

*.bed should work, however. BAM is not supoorted by mIB.

ADD REPLYlink written 5.7 years ago by Aaronquinlan10k
0
gravatar for lyfsa
6.5 years ago by
lyfsa30
lyfsa30 wrote:

Can I also merge the overlapping position, say start position and end position if in range of 0-50 ???

ADD COMMENTlink written 6.5 years ago by lyfsa30

Please use comments under answers to ask further questions, rather than posting questions as answers.

ADD REPLYlink written 6.5 years ago by Neilfws48k

Sorry, I overlooked the "merge overlapping" part in your question. I guess Sukhdeep's reply does exactly what you require.

ADD REPLYlink written 6.5 years ago by Arun2.3k
0
gravatar for Sandeep
6.5 years ago by
Sandeep250
Manipal, India
Sandeep250 wrote:

For people not very comfortable using bedTools or other command line methods, an alternative way would be to use Galaxy server. Operate on genomic intervals will let you merge data sets.

ADD COMMENTlink written 6.5 years ago by Sandeep250
0
gravatar for Alex Reynolds
5.9 years ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

You can use BEDOPS and set operations to solve this problem for three generic files. However, it is unclear from your overlap criteria how you are getting the end coordinate of 34. Note that the region chr1:33-34 is not common to all three sample input files, as presented, being found only in file2.bed.

In any case, here is how you can solve this problem:

$ bedops --merge file2.bed | bedmap --echo-map file1.bed - | bedops --intersect - file3.bed
chr1    30  33
chr1    200 300

Let's break down how these three commands work together.

(1) The bedops statement uses the --merge operator to merge elements in file2.bed into contiguous (non-overlapping) regions. This result is piped into the bedmap statement.

(2) The bedmap statement uses the --echo-map operator to report all contiguous regions in the merged file2.bed (the "map" file) which overlap elements in file1.bed (the "reference" file) by one or more bases:

$ bedops --merge file2.bed > file2.merged.bed
$ bedmap --echo-map file1.bed file2.merged.bed
chr1    25  34

chr1    200 300

(The second line is blank, because there is no region in the merged file2.bed which overlaps chr1:100-120 from file1.bed.)

(3) This result is piped into the last bedops statement, which uses the --intersect operator to intersect those two non-empty regions chr1:25-34 and chr1:200-300 with regions in file3.bed.

The final answer consists of bases that are common to file1.bed, file2.bed and file3.bed.

Note: The only assumption these tools make is that all input BED files are sorted. This allows BEDOPS apps to run very fast and with a low memory profile, as compared with alternative toolkits which do not require sorted input (or which have only recently added sorting requirements after publication of BEDOPS). For your example inputs, this is not an issue. For the general case, we provide the sort-bed application to prep the BED inputs, if the sort-states of the input BED files are unknown.

ADD COMMENTlink modified 5.9 years ago • written 5.9 years ago by Alex Reynolds27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1297 users visited in the last hour