Dear all,
I am aiming to find 70 % reciprocal overlapping sites and collapse them into a single non-overlapping site list.
However, it seems there is a little more tweaking that needs to be done to collapse into single site list after finding the reciprocal overlap calls.
I have used bedtools intersect in the below way:
Example sites: cnv.bed
chr1    353    6405
chr1    355    6389
chr1    501    6401
chr1    549    6447
chr1    1812    28093
chr1    3286    6382
chr1    3694    6428
chr1    3695    6413
chr1    3729    6677
chr1    4084    6380
/bedtools2/bin/intersectBed -a cnv.bed -b cnv.bed -f 0.7 -r -wa -wb  | head
chr1    353    6405    chr1    353    6405
chr1    353    6405    chr1    355    6389
chr1    353    6405    chr1    501    6401
chr1    353    6405    chr1    549    6447
chr1    355    6389    chr1    353    6405
chr1    355    6389    chr1    355    6389
chr1    355    6389    chr1    501    6401
chr1    355    6389    chr1    549    6447
chr1    501    6401    chr1    353    6405
chr1    501    6401    chr1    355    6389
chr1    501    6401    chr1    501    6401
chr1    501    6401    chr1    549    6447
which lists the sites with 70% reciprocal overlap by comparing the sites with each possible pair.
From here we need to collapse those overlapping sites into a single site list. i.e. to remove the redundant regions keeping only one region which is representative of the overlapping regions. Would you suggest us how to achieve this.
it looks like the code takes
boogens.txtwhich is basically one input file. The text after that say sort first for fileA and fileB. I wonder what is the input to the command. Should the input be:Sorry, didn't make that clear.
boogens.txtwould be your output fromintersectBed, like you had as the second output file in your initial question, which has six columns:columns 1-3 are a peak from file 1 (possibly repeated over multiple lines, if multiple matches in file 2), and columns 4-6 are the matching peaks in file 2.I used the output from intersectBed as input to the awk:
The output and input are similar. Did I use the code in a wrong way or did I tweaked it wrong?
My mistake, I made a syntax error and a logic error.
$3==startBneeds to be$3==endA. Sorry about that. It seemed to work for me, where I get three lines as output:Is that the output you would expect?
Yes, it seems to have the desired output.