Question: bedtools sort and merge
0
gravatar for hosein_salehi6
8 weeks ago by
hosein_salehi60 wrote:

Hello there

I want to find common pieces (regions) between the below coordinates in two big files?

file1:                                      file2:
chr1:251790423-251855075    chr1:251391746-251411804
chr1:259520908-259580523    chr1:259687605-259759271
chr1:261294390-261396569    chr1:259815659-259854201
chr2:108327854-108382699    chr2:108327854-108388888
chr20:28151226-28420685     chr20:28141234-28520687
chr3:15673814-15987811      chr3:15673814-15997815
chr10:70552773-71399757     chr10:70782782-71499757

I have tried:

cat file1 file2 | sed "s/[-:]/\t/g" | bedtools sort | bedtools merge > result

But it can only merge them. I would be thankful if I could have your kind suggestions regarding to find only common regions (without extra lengths) between two files.

genome • 167 views
ADD COMMENTlink modified 8 weeks ago by Alex Reynolds31k • written 8 weeks ago by hosein_salehi60
1
gravatar for Alex Reynolds
8 weeks ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

If you use the bash shell, you can use BEDOPS bedops with process substitutions to create an efficient one-line solution:

$ bedops --intersect <(sed "s/[-:]/\t/g" file1 | sort-bed -) <(sed "s/[-:]/\t/g" file2 | sort-bed -) > answer.bed

The result will be in answer.bed.

If you use zsh (as some on Mac OS do), then the syntax for process substitutions is slightly different, but it's the same idea. (If you use Mac OS, though, your sed command would probably be a bit different.)

Bonus BEDOPS: One advantage is the ability to specify arbitrary numbers of processes, if you have more than two files:

$ bedops --intersect <(sed "s/[-:]/\t/g" file1 | sort-bed -) <(sed "s/[-:]/\t/g" file2 | sort-bed -) ... <(sed "s/[-:]/\t/g" fileN | sort-bed -) > answer.bed

You can specify as many as you like, up to your operating system's file handle limit (usually 1021, but that can be adjusted).

ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by Alex Reynolds31k

Thanks to send me this information. I have installed bedops as a "bin" file and run the command exactly in the bin directory and also another directory, but still I have these errors:

bedops --intersect <(sed "s/[-:]/\t/g" ourstudy | sort-bed -) <(sed "s/[-:]/\t/g" mastudy | sort-bed -) > answer.bed
bash: sort-bed: command not found...
bash: sort-bed: command not found...
bash: bedops: command not found...
  

Do you know what is the problem?

ADD REPLYlink modified 7 weeks ago by h.mon32k • written 7 weeks ago by hosein_salehi60

You either need to copy binaries to /usr/local/bin or add the directory containing binaries to your PATH environment variable.

See: https://bedops.readthedocs.io/en/latest/content/installation.html#linux

ADD REPLYlink written 7 weeks ago by Alex Reynolds31k
0
gravatar for h.mon
8 weeks ago by
h.mon32k
Brazil
h.mon32k wrote:

Have a look at bedtools (or bedops) intersect. Your files are not bed files, you will need to massage them before proceeding.

edit: you are already massaging your files, I overlooked the sed "s/[-:]/\t/g" part.

ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by h.mon32k
0
gravatar for MatthewP
8 weeks ago by
MatthewP840
China
MatthewP840 wrote:

What bedtools merge command doing is merge overlap intervals, in your case, you will get union sets of bed intervals. You should use bedtools intersect to get only common regions, but you need to convert to bed format first.

ADD COMMENTlink written 8 weeks ago by MatthewP840
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2441 users visited in the last hour