Question: bedops merge .bed without collapsing sites
0
gravatar for fleurg
20 months ago by
fleurg30
fleurg30 wrote:

Hi there, I am currently trying to combine sorted .bed files into a matrix. I am interested in every unique site and do not want to merge locations. Using various commands (--intersect, --merge, --union) in bedops has resulted in the merging of sites. The files have 5 columns> chr/ startbp /endbp /coverage /%methylation

Example- head 1.txt.sorted (first 10 lines)

chr1    10472   10473   4   0
chr1    10479   10480   1   0
chr1    10481   10482   4   0
chr1    10482   10483   1   0
chr1    10485   10486   4   0
chr1    10487   10488   1   0
chr1    10491   10492   1   0
chr1    10495   10496   1   0
chr1    10498   10499   4   0
chr1    10501   10502   1   0

Example- head 2.txt.sorted (first 10 lines)

chr1    10472   10473   0   0
chr1    10479   10480   0   0
chr1    10481   10482   0   0
chr1    10482   10483   0   0
chr1    10485   10486   0   0
chr1    10487   10488   2   0
chr1    10491   10492   2   0
chr1    10495   10496   5   0
chr1    10498   10499   0   0
chr1    10501   10502   7   0

merge .bed 1& 2 Result (incorrect- as doesn't have a line for 10482)

chr1    10472   10473
chr1    10479   10480
chr1    10481   10483 
chr1    10485   10486
chr1    10487   10488
chr1    10491   10492
chr1    10495   10496
chr1    10498   10499

Example of some commands I have tried:

Eg. 1 below --> this merges the bed files (-n), then removes sites that are not common (-u), pastes the 4/5th columns into the matrix .....#issue arises in the merge .bed - as sites that are 1bp are merged together?

bedops -u *.txt.sorted > union.bed
bedops -m *.txt.sorted > merge.bed

for fn in `ls *.txt.sorted`; do echo $fn; bedops -n 1 merge.bed $fn | awk '{ print $0"\tNA\tNA" }' | bedops -u - $fn | cut -f 4-5 > $fn.map; done
paste merge.bed *.txt.sorted.map > merged_noncpg_encode.mtx

Eg. 2

bedops --intersect --range 0 *.bed.txt.sorted > Int.mtx

Would be great if someone could provide some guidance as to how I can avoid merging of sites such as those that are 1bp apart. Or send me to a similar Q&A (all my current searches have been unsuccessful)

Thanks so much!

genome • 870 views
ADD COMMENTlink modified 20 months ago by cpad011213k • written 20 months ago by fleurg30
2

I want to help but I am lost about what you're trying to do. Can you provide a simpler example (and use indents to format code)? What are you really trying to do? What does the output need to look like?

ADD REPLYlink modified 20 months ago • written 20 months ago by Alex Reynolds30k

If you just want single-base elements from N BED files:

$ bedops --merge A.bed B.bed C.bed ... N.bed | bedops --chop 1 - > chopped.bed

You can then map score or ID columns back to the union with bedmap, e.g.:

$ bedmap --echo --echo-map-score chopped.bed union.bed > map.bed

Or:

$ bedmap --echo --echo-map-id-uniq chopped.bed union.bed > map.bed
ADD REPLYlink modified 20 months ago • written 20 months ago by Alex Reynolds30k
1

Thanks for the reply Alex. Sorry for the complicated example. Appreciate your reply- which works. I also used the partition command.

ADD REPLYlink written 20 months ago by fleurg30

Using only those 10 lines, what should your output look like? Do you need this to work with bedops or can you use a combination of standard unix commands?

ADD REPLYlink written 20 months ago by vkkodali2.0k
1
gravatar for fleurg
20 months ago by
fleurg30
fleurg30 wrote:

I managed to find a solution using the --partition command in bedops

bedops --partition *.txt.sorted > sitelist.bed

for fn in `ls *.txt.sorted`; do echo $fn; bedops -n 1 sitelist.bed $fn | awk '{ print $0"\tNA\tNA" }' | bedops -u - $fn | cut -f 4-5 > $fn.map; done

paste sitelist.bed *.txt.sorted.map > merged_noncpg_encode.mtx
ADD COMMENTlink modified 20 months ago by Alex Reynolds30k • written 20 months ago by fleurg30

Nice one. I forgot about --partition but it is definitely a way to go.

ADD REPLYlink written 20 months ago by Alex Reynolds30k
0
gravatar for cpad0112
20 months ago by
cpad011213k
India
cpad011213k wrote:

Non-bedops solution: with ebay tsv-utils:

$  tsv-join -f test1.txt --key-fields 1,2,3 test2.txt  | cut -f1-3

chr1    10472   10473
chr1    10479   10480
chr1    10481   10482
chr1    10482   10483
chr1    10485   10486
chr1    10487   10488
chr1    10491   10492
chr1    10495   10496
chr1    10498   10499
chr1    10501   10502

input:

$  tail -n+1 test*.txt
==> test1.txt <==
chr1    10472   10473   4   0
chr1    10479   10480   1   0
chr1    10481   10482   4   0
chr1    10482   10483   1   0
chr1    10485   10486   4   0
chr1    10487   10488   1   0
chr1    10491   10492   1   0
chr1    10495   10496   1   0
chr1    10498   10499   4   0
chr1    10501   10502   1   0

==> test2.txt <==
chr1    10472   10473   0   0
chr1    10479   10480   0   0
chr1    10481   10482   0   0
chr1    10482   10483   0   0
chr1    10485   10486   0   0
chr1    10487   10488   2   0
chr1    10491   10492   2   0
chr1    10495   10496   5   0
chr1    10498   10499   0   0
chr1    10501   10502   7   0
ADD COMMENTlink modified 20 months ago • written 20 months ago by cpad011213k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1096 users visited in the last hour