Question

Genomecoveragebed - Bedtool For Reporting Per Base Genome Coverage

0

Entering edit mode

12.2 years ago

Varun Gupta ★ 1.3k

Hi Everyone

I would nedd some help on genomeCoverageBed tool. This tools when used for finding per base genome coverage uses an option -d.

I am actually interested in finding read counts for each base within a particular intron of a gene.

I will like to explain you more just to make myself clear. I used IGV to see how my alignments looks and moreover what is the coverage of each base within a particular intron. When I take my cursor in IGV to the area exactly above the base (i am interested in)within the coverage track it gives me such details:

Total Count:6
A:0
C:0
G:6
T:0
N:0

Now this total count is basically the read count for the base G within that intron. This counts says that 6 reads have actually covered this base position(and hence base).

Now when i use this code snippet which is basically finding per base genome coverage

genomeCoverageBed -i 2-B3-1b-D303A_sorted.bed -g pombe.genome -d

this code gives me around 31 as the depth for that base(i.e G in my example). Looking closely in IGV i figured out that this 21 is basically 21 = 6 + 15

where 6 is the actual reads that has covered this base position(hence base) and 15 means that these reads have not covered that base at that position, but since the genomeCoverageBed tool calculates depth of feature coverage it also includes all those reads which skips that particular base.

I would provide you with an image to make it more clear

alt text

I would like to know how can i get only the read counts (here 6) for that particular base rather than getting the entire feature depth. I ran the above command and it gave me the per base genome coverage output for every base. What else should i do next to extract only the rread counts for that particular base. I can view it in IGV but want it in text format.

Any command line or code will be helpful.

Hope to hear from you soon.

Regards Varun

bedtools coverage • 11k views

ADD COMMENT • link updated 10.3 years ago by hygine555 • 0 • written 12.2 years ago by Varun Gupta ★ 1.3k

0

Entering edit mode

What do you mean by reads that skipped the particular base? Do you mean reads that have a mismatch at that base? So 6 reads have a matching G, but 15 have mismatches?

ADD REPLY • link 12.2 years ago by Damian Kao 16k

0

Entering edit mode

Nevermind my last question. I had a brain fart there. You are looking at mate pair reads, so you mean the inter-mate sequences that were skipped.

ADD REPLY • link 12.2 years ago by Damian Kao 16k

0

Entering edit mode

Hi Well if you look at the graph it says read count as 6 but actually genomeCoverageBed reports it as 21. So that actually means that only 6 reads covered that base (G) whereas other 15 were skipped but are always counted in genomeCoverageBed(thin blue lines).

ADD REPLY • link 12.2 years ago by Varun Gupta ★ 1.3k

score 2 · Answer 1 · 2012-02-16

2

Entering edit mode

12.2 years ago

Madelaine Gogol 5.3k

You just need the -split argument, I think.

ADD COMMENT • link 12.2 years ago by Madelaine Gogol 5.3k

0

Entering edit mode

Let me try with split..

thanks

ADD REPLY • link 12.2 years ago by Varun Gupta ★ 1.3k

0

Entering edit mode

Hi I tried using split but my bed file which i produced from the bam only contains 6 coloumns and not 12 as req for split.

Did you got my question as to what am i trying to get.

Regards

ADD REPLY • link 12.2 years ago by Varun Gupta ★ 1.3k

Ram · Answer 2 · 2012-03-02

I am guessing you are viewing your bam in igv. if you are using bamToBed to convert, you can try using the -bedpe option to output bed12: http://code.google.com/p/bedtools/wiki/Usage#bamToBed What I'm guessing is happening is that your 2-B3-1b-D303A_sorted.bed file contains the whole fragment (treats properly paired as one long fragment) and you would either want to use split (mentioned above) or discard the paired end information and convert so that everything is singleton

Btw, genomeCoverageBed can take bam files as input.

score 0 · Answer 3 · 2014-01-18

0

Entering edit mode

10.3 years ago

hygine555 • 0

I have the same question about this, anyone help us?

ADD COMMENT • link 10.3 years ago by hygine555 • 0

Ram · Answer 4 · 2014-01-19

I have fix this problem use igvtools. The command line like this: I use the .wig filetype as the output type, .wig could use txt open, its contain A G C T N number. I think this is your answer.

Usage:
          igvtools count [options] [inputFile] [outputFile] [genome]

Required arguments:
          inputFile    The input file (see supported formats above).
          outputFile   Binary output file.  Must end in ".tdf" or ".wig".  To indicate that you want to
                             output both a .tdf and a .wig file, list both output filenames as a single string,
                             separated by a comma with no other delimiters.  To display feature
                             intensity in IGV, the density must be computed with this command, and the
                             resulting file must be named <feature track filename>.tdf.