Question: comparing two tracks over intervals on the genome
0
gravatar for moxu
3.9 years ago by
moxu460
moxu460 wrote:

Suppose I have two groups: treatment & control. Each group has a value assigned to each interval on the genome. Let me explain with the following example:

File A contains data for the treatment group and has the following coordinate sorted format:

chr1 100 110 3
chr1 200 212 8
chr1 220 240 5
chr1 500 600 1000
...

File B contains data for the control group:

chr1 100 108 7
chr1 150 160 14
chr1 210 230 88
chr1 700 850 3
...

And I would like to get results like the following: chr1 100 110 3 7

chr1 150 160 0 14
chr1 200 240 13 88
chr1 500 600 1000 0
chr1 700 850 0 3
...

The 3rd entry is a merge of two tracks on A sided by an entry on B.

Is there a software product to do so?

Thank you!

ADD COMMENTlink modified 3.9 years ago by Alex Reynolds31k • written 3.9 years ago by moxu460
0
gravatar for moxu
3.9 years ago by
moxu460
moxu460 wrote:

Well, thought "bedtools intersect -c" would work, but in effect it does not -- it serves more like a SQL "left join".

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by moxu460
0
gravatar for Vitis
3.9 years ago by
Vitis2.4k
New York
Vitis2.4k wrote:

I think this may be able to do it: standardize the windows on the left side, instead of using jumping windows, pick a window size and lay them out across the entire genome (such as chr1 1 10, chr1 11 20, chr1 21, 30...), leaving some windows with zero values. Then it might be easier to merge your data as you like.

ADD COMMENTlink written 3.9 years ago by Vitis2.4k

What a novel idea! Thanks for your input. However, it seems too complicated. What I am planning to do is to do "bedtools intersect -c -a A -b B" & "bedtools intersect -c -a B -b A", and then combine the results somehow.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by moxu460

Well, the solution based on bedops below is perfectly fine for this type of problem, I usually create this type of data with seamless chromosome windows for data standardization and documentation purpose. Somewhere down the road, you may repetitively need these operations to intersect new data. The standardized windows would make your life much easier in the future.

ADD REPLYlink written 3.8 years ago by Vitis2.4k
0
gravatar for Alex Reynolds
3.9 years ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

Via BEDOPS:

Merge sets A and B with bedops:

$ bedops --merge A B > AB.merge

Map set A to the merged set and sum the ID values with bedmap and awk:

$ bedmap --echo-map-id --multidelim '\t' AB.merge A | awk '{if(NF==0){print 0;}else{s=0;for(i=1;i<=NF;i++){s+=$i;}print s;}}' > A.merge.sum

Repeat with set B:

$ bedmap --echo-map-id --multidelim '\t' AB.merge B | awk '{if(NF==0){print 0;}else{s=0;for(i=1;i<=NF;i++){s+=$i;}print s;}}' > B.merge.sum

Paste the AB merged set to the A and B merge-sums with paste:

$ paste AB.merge A.merge.sum B.merge.sum
chr1    100 110 3       7
chr1    150 160 0       14
chr1    200 240 13      88
chr1    500 600 1000    0
chr1    700 850 0       3
ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Alex Reynolds31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 942 users visited in the last hour