Question: How would I go about binning read depth information obtained from SamTools?
gravatar for ges29
3 months ago by
ges290 wrote:

I've mapped some reads to an assembly using bwa mem and now I'd like to visualise the read depth to see which areas of the genome have the most coverage.

I used samtools depth on the resulting .bam file to pull out the per-base read frequencies and it's created a file with the following format:

Scaffold      Position freq.   
K493scaffold_1  9341    28
K493scaffold_1  9342    28
K493scaffold_1  9343    28
K493scaffold_1  9344    28
K493scaffold_1  9345    28
K493scaffold_1  9346    28
K493scaffold_1  9347    28
K493scaffold_1  9348    28
K493scaffold_1  9349    28
K493scaffold_1  9350    28
K493scaffold_1  9351    28
K493scaffold_1  9352    1
K493scaffold_1  10273   1
K493scaffold_1  10274   188
K493scaffold_1  10275   189
K493scaffold_1  10276   189
K493scaffold_1  10277   189
K493scaffold_1  10278   189
K493scaffold_1  10279   189
K493scaffold_1  10280   189
K493scaffold_1  10281   189
K493scaffold_1  10282   189
K493scaffold_1  10283   189
K493scaffold_1  10284   189

I could try plotting the entire file; however, it's pretty large (so most programs wouldn't be able to handle it) and there's 1366 scaffolds, each of which contains ~30kb of positions. So obviously this would be a pain to navigate.

So now, for each scaffold, I'd like to bin the base positions into 500bp sections and take an average of the frequency for each bin. For example, a disired output would be something like this:

K493scaffold_1 1-500            28
K493scaffold_1 501-1000         71
K493scaffold_1 1001-1500        98
K493scaffold_1 1501-2000        2
K493scaffold_1 2001-2500        17

I was wondering if there's any utility out there which can already do what I'm asking before I embark on writing a script myself?

I've already tried bedtools genomecovBEDGRAPH output but it's not quite doing what I'm looking for as it's not sorting the data into regular sized bins.

Thanks in advance for any help anyone can provide!

ADD COMMENTlink written 3 months ago by ges290

Hi, short on time, but i think that this may help: C: Bin chromosome every 1kb and get average value

ADD REPLYlink written 3 months ago by Kevin Blighe61k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1569 users visited in the last hour