Intersect counts with fragment regions
3
0
Entering edit mode
4.5 years ago
dzisis1986 ▴ 60

I have a fragments file ( fragments_chr.bed ) like this :

 chr1   0   7506
chr1    7512    10943
chr1    10949   13169
chr1    13175   20064
chr1    20070   28272
chr1    28278   29959
chr1    29965   36598
chr1    36604   37512
chr1    37518   40359
chr1    40365   48795
chr1    48801   50660
chr1    50666   52359
chr1    52365   52850
chr1    52856   54447
chr1    54453   54521
chr1    54527   54816
chr1    54822   64009
chr1    64015   66453
chr1    66459   66724
chr1    66730   66943
chr1    66949   70664
chr1    70670   74343
chr1    74349   77084
chr1    77090   83123
chr1    83129   87901
chr1    87907   90967


and the results of my analysis ( counts.bed ) in counts like this

chr1    7506    7512    346
chr1    37537   37543   458
chr1    77108   77115   844
chr1    83121   83129   1629
chr1    83148   83155   1779


I would like to create a new file in order to collect all read counts into my fragments. By using bedtools intersect i ve noticed that i am losing some counts mainly those which are at the begging fragment of each chromosome. When i am running bedtools intersect like this

bedtools intersect -a fragments_chr.bed -b counts.bed -wa  -wb > intersect1_1_w4cseq_test.txt


I have this result :

chr1    37518   40359   chr1    37537   37543   458
chr1    77090   83123   chr1    77108   77115   844
chr1    77090   83123   chr1    83121   83129   1629
chr1    83129   87901   chr1    83148   83155   1779


In that case the count in chr1 7506 7512 346 is missing.

Any suggestion how to solve it or any other way to do it ?

Thank you

Dimitris

bedtools intersect R ranges readcounts • 1.8k views
0
Entering edit mode
4.5 years ago
Yujin Kim ▴ 10

First of all, I believe you forgot the -wa and -wb options that create your final output. Second the chr1 7506 7512 346 coordonates don't intersect with any positions in your fragments_chr.bed file. So it won't appear in your output.

0
Entering edit mode

YEs sorry for the mistake i forgot the -wa -wb i will add it . Ok how can we do it in order to appear ? Because even if this doesnt overlap we have 2 fragments in the fragments_chr.bed file with thoose coordinates.

chr1   0   7506
chr1    7512    10943


and it should be somewhere right ? Otherwise we are loosing counts ..Do you know any other way apart from intersect in order to add my counts in those ranges of chr positions ?

0
Entering edit mode

bed files are zero based.@ OP

0
Entering edit mode

I see no way how this should work. As Yujin Kim says this two fragments have no overlap with your counts.bed.

Take care that the bed file uses half-open intervals, which means that the start position (column 2) is included but the end position (column 3) is not.

Let's have a look at the first line of fragments_chr.bed:

chr1   0   7506


This means, the fragment includes the position 0 - 7505, but not 7506.

And the first line of counts.bed:

chr1    7506    7512    346


This means the region start at 7506 (and so 1 position after your fragment) and includes all positions up to 7511.

fin swimmer

0
Entering edit mode

So this count doesnt belong anywhere ? it cant go either to chr1 7512 10943 or to chr1 0 7506. In this way we can not see with intersect the numbers of reads per fragment. Or the numbers of reads per fragment are only those which means some of your mapped reads are not included in the fragment.

1
Entering edit mode

in case of doubt, add -loj option to bedtools function above. It would print all the positions with and without overlap.

0
Entering edit mode
4.5 years ago

Another approach via BEDOPS bedmap:

$bedmap --echo --echo-map-id --delim '\t' fragments_chr.bed counts.bed > answer.bed  In cases where elements of fragments_chr.bed do not overlap counts.bed, there will be an empty string after the fragment interval. Add --skip-unmapped to this command to filter these results. ADD COMMENT 0 Entering edit mode 4.5 years ago If you want to merge the interval space in counts.bed with fragments_chr.bed, then you could do a symmetric difference operation on the two inputs, union the difference result with fragments_chr.bed, and map that set back to counts.bed to be guaranteed coverage: $ bedops -s fragments_chr.bed counts.bed | bedops -u - fragments_chr.bed | bedmap --echo --echo-map-id --delim '\t' - counts.bed > answer.bed


You're adding disjoint intervals to the fragments_chr.bed file, in this case, but it will "fill in gaps" before mapping, if that is your aim.

It just depends on what you want to do, but Unix streams will help you easily join together a bunch of BEDOPS set operations.

0
Entering edit mode

Thank you Alex this probably helps a lot . I will check it and i will come back with the result !

0
Entering edit mode

What this is doing is to create a new file similar to counts.bed but at the end is pasted the fragments_chr.bed ,maybe i didnt understand exactly what you mean by symmetric difference operation on the 2 inputs.