Merge regions in bedtools genomecov/bedgraph file
0
0
Entering edit mode
2.5 years ago
eoneill627 ▴ 20

I want to calculate the number of reads mapping to regions of the genome, and I want to group these reads into bins of say 1kb. I have used bedtools genomecov to generate a bedgraph file that shows the coverage of reads across the genome.

bedtools genomecov -bg -ibam file.bam >Coverage.bedgraph

I then merged nearby regions of coverage using:

bedtools merge -d 1000 -c 4 -o sum -i Coverage.bedgraph > Coverage.merged.bedgraph

The problem is that the "sum" part adds the coverage from the different regions, so when merged, reads are counted more than once.

Is there a way to merge the regions without counting reads more than once? I think the necessary information from the bam file is probably lost at this point. Is there a way to do this with an option in bedtools genomecov i.e. can you make it report regions of coverage with a specified 'bin' size?

bedgraph genomecov • 1.6k views
ADD COMMENT
1
Entering edit mode

I think deeptools bamCoverage has the option "--binSize", which seems like what you want. But I was still questioning if it can solve the "counting reads several times" problem since if a read span two bins, it will still be counted twice....

ADD REPLY
0
Entering edit mode

It may still count it twice, but I guess the larger the bin, the smaller this issue becomes. Thanks, I'll give this a go.

ADD REPLY
0
Entering edit mode

This is great. The reads are occasionally still counted twice, but the bins are large enough that this isn't a big issue. Thanks!

ADD REPLY
0
Entering edit mode

Just take any other operation than sum that you find appropriate. Mean, median, min, max, choice is yours.

ADD REPLY
0
Entering edit mode

I'm not sure what those operations do. I couldn't really understand from the bedtools documentation.

ADD REPLY
0
Entering edit mode

No magic here, min is the minimum, max the maximum, median the median and so on. So if you bin spans three intervals with coverage 3, 5 and 9 then median would report 5 for the bin, max 9 and min 3.

ADD REPLY

Login before adding your answer.

Traffic: 2186 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6