bedops merge .bed without collapsing sites
2
0
Entering edit mode
5.5 years ago
fleurg ▴ 30

Hi there, I am currently trying to combine sorted .bed files into a matrix. I am interested in every unique site and do not want to merge locations. Using various commands (--intersect, --merge, --union) in bedops has resulted in the merging of sites. The files have 5 columns> chr/ startbp /endbp /coverage /%methylation

Example- head 1.txt.sorted (first 10 lines)

chr1    10472   10473   4   0
chr1    10479   10480   1   0
chr1    10481   10482   4   0
chr1    10482   10483   1   0
chr1    10485   10486   4   0
chr1    10487   10488   1   0
chr1    10491   10492   1   0
chr1    10495   10496   1   0
chr1    10498   10499   4   0
chr1    10501   10502   1   0

Example- head 2.txt.sorted (first 10 lines)

chr1    10472   10473   0   0
chr1    10479   10480   0   0
chr1    10481   10482   0   0
chr1    10482   10483   0   0
chr1    10485   10486   0   0
chr1    10487   10488   2   0
chr1    10491   10492   2   0
chr1    10495   10496   5   0
chr1    10498   10499   0   0
chr1    10501   10502   7   0

merge .bed 1& 2 Result (incorrect- as doesn't have a line for 10482)

chr1    10472   10473
chr1    10479   10480
chr1    10481   10483 
chr1    10485   10486
chr1    10487   10488
chr1    10491   10492
chr1    10495   10496
chr1    10498   10499

Example of some commands I have tried:

Eg. 1 below --> this merges the bed files (-n), then removes sites that are not common (-u), pastes the 4/5th columns into the matrix .....#issue arises in the merge .bed - as sites that are 1bp are merged together?

bedops -u *.txt.sorted > union.bed
bedops -m *.txt.sorted > merge.bed

for fn in `ls *.txt.sorted`; do echo $fn; bedops -n 1 merge.bed $fn | awk '{ print $0"\tNA\tNA" }' | bedops -u - $fn | cut -f 4-5 > $fn.map; done
paste merge.bed *.txt.sorted.map > merged_noncpg_encode.mtx

Eg. 2

bedops --intersect --range 0 *.bed.txt.sorted > Int.mtx

Would be great if someone could provide some guidance as to how I can avoid merging of sites such as those that are 1bp apart. Or send me to a similar Q&A (all my current searches have been unsuccessful)

Thanks so much!

genome • 3.2k views
ADD COMMENT
2
Entering edit mode

I want to help but I am lost about what you're trying to do. Can you provide a simpler example (and use indents to format code)? What are you really trying to do? What does the output need to look like?

ADD REPLY
0
Entering edit mode

If you just want single-base elements from N BED files:

$ bedops --merge A.bed B.bed C.bed ... N.bed | bedops --chop 1 - > chopped.bed

You can then map score or ID columns back to the union with bedmap, e.g.:

$ bedmap --echo --echo-map-score chopped.bed union.bed > map.bed

Or:

$ bedmap --echo --echo-map-id-uniq chopped.bed union.bed > map.bed
ADD REPLY
1
Entering edit mode

Thanks for the reply Alex. Sorry for the complicated example. Appreciate your reply- which works. I also used the partition command.

ADD REPLY
0
Entering edit mode

Using only those 10 lines, what should your output look like? Do you need this to work with bedops or can you use a combination of standard unix commands?

ADD REPLY
1
Entering edit mode
5.5 years ago
fleurg ▴ 30

I managed to find a solution using the --partition command in bedops

bedops --partition *.txt.sorted > sitelist.bed

for fn in `ls *.txt.sorted`; do echo $fn; bedops -n 1 sitelist.bed $fn | awk '{ print $0"\tNA\tNA" }' | bedops -u - $fn | cut -f 4-5 > $fn.map; done

paste sitelist.bed *.txt.sorted.map > merged_noncpg_encode.mtx
ADD COMMENT
0
Entering edit mode

Nice one. I forgot about --partition but it is definitely a way to go.

ADD REPLY
0
Entering edit mode
5.5 years ago

Non-bedops solution: with ebay tsv-utils:

$  tsv-join -f test1.txt --key-fields 1,2,3 test2.txt  | cut -f1-3

chr1    10472   10473
chr1    10479   10480
chr1    10481   10482
chr1    10482   10483
chr1    10485   10486
chr1    10487   10488
chr1    10491   10492
chr1    10495   10496
chr1    10498   10499
chr1    10501   10502

input:

$  tail -n+1 test*.txt
==> test1.txt <==
chr1    10472   10473   4   0
chr1    10479   10480   1   0
chr1    10481   10482   4   0
chr1    10482   10483   1   0
chr1    10485   10486   4   0
chr1    10487   10488   1   0
chr1    10491   10492   1   0
chr1    10495   10496   1   0
chr1    10498   10499   4   0
chr1    10501   10502   1   0

==> test2.txt <==
chr1    10472   10473   0   0
chr1    10479   10480   0   0
chr1    10481   10482   0   0
chr1    10482   10483   0   0
chr1    10485   10486   0   0
chr1    10487   10488   2   0
chr1    10491   10492   2   0
chr1    10495   10496   5   0
chr1    10498   10499   0   0
chr1    10501   10502   7   0
ADD COMMENT

Login before adding your answer.

Traffic: 2559 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6