Question: Genomecoveragebed - Bedtool For Reporting Per Base Genome Coverage
0
gravatar for Varun Gupta
6.9 years ago by
Varun Gupta1.1k
United States
Varun Gupta1.1k wrote:

Hi Everyone

I would nedd some help on genomeCoverageBed tool. This tools when used for finding per base genome coverage uses an option -d.

I am actually interested in finding read counts for each base within a particular intron of a gene.

I will like to explain you more just to make myself clear. I used IGV to see how my alignments looks and moreover what is the coverage of each base within a particular intron. When I take my cursor in IGV to the area exactly above the base (i am interested in)within the coverage track it gives me such details:

Total Count:6
A:0
C:0
G:6
T:0
N:0

Now this total count is basically the read count for the base G within that intron. This counts says that 6 reads have actually covered this base position(and hence base).

Now when i use this code snippet which is basically finding per base genome coverage

genomeCoverageBed -i 2-B3-1b-D303A_sorted.bed -g pombe.genome -d

this code gives me around 31 as the depth for that base(i.e G in my example). Looking closely in IGV i figured out that this 21 is basically 21 = 6 + 15

where 6 is the actual reads that has covered this base position(hence base) and 15 means that these reads have not covered that base at that position, but since the genomeCoverageBed tool calculates depth of feature coverage it also includes all those reads which skips that particular base.

I would provide you with an image to make it more clear

alt text

I would like to know how can i get only the read counts (here 6) for that particular base rather than getting the entire feature depth. I ran the above command and it gave me the per base genome coverage output for every base. What else should i do next to extract only the rread counts for that particular base. I can view it in IGV but want it in text format.

Any command line or code will be helpful.

Hope to hear from you soon.

Regards Varun

bedtools coverage • 8.9k views
ADD COMMENTlink modified 5.0 years ago by hygine5550 • written 6.9 years ago by Varun Gupta1.1k

What do you mean by reads that skipped the particular base? Do you mean reads that have a mismatch at that base? So 6 reads have a matching G, but 15 have mismatches?

ADD REPLYlink written 6.9 years ago by Damian Kao15k

Nevermind my last question. I had a brain fart there. You are looking at mate pair reads, so you mean the inter-mate sequences that were skipped.

ADD REPLYlink written 6.9 years ago by Damian Kao15k

Hi Well if you look at the graph it says read count as 6 but actually genomeCoverageBed reports it as 21. So that actually means that only 6 reads covered that base (G) whereas other 15 were skipped but are always counted in genomeCoverageBed(thin blue lines).

ADD REPLYlink written 6.9 years ago by Varun Gupta1.1k
2
gravatar for Madelaine Gogol
6.9 years ago by
Madelaine Gogol5.0k
Kansas City
Madelaine Gogol5.0k wrote:

You just need the -split argument, I think.

ADD COMMENTlink written 6.9 years ago by Madelaine Gogol5.0k

Let me try with split..

thanks

ADD REPLYlink written 6.9 years ago by Varun Gupta1.1k

Hi I tried using split but my bed file which i produced from the bam only contains 6 coloumns and not 12 as req for split.

Did you got my question as to what am i trying to get.

Regards

ADD REPLYlink written 6.9 years ago by Varun Gupta1.1k
1
gravatar for Ying W
6.9 years ago by
Ying W3.9k
South San Francisco, CA
Ying W3.9k wrote:

I am guessing you are viewing your bam in igv. if you are using bamToBed to convert, you can try using the -bedpe option to output bed12: http://code.google.com/p/bedtools/wiki/Usage#bamToBed

what im guessing is happening is that your 2-B3-1b-D303A_sorted.bed file contains the whole fragment (treats properly paired as one long fragment) and you would either want to use split (mentioned above) or discard the paired end information and convert so that everything is singleton

btw, genomeCoverageBed can take bam files as input

ADD COMMENTlink written 6.9 years ago by Ying W3.9k
0
gravatar for hygine555
5.0 years ago by
hygine5550
hygine5550 wrote:

I have the same question about this, anyone help us?

ADD COMMENTlink written 5.0 years ago by hygine5550
0
gravatar for hygine555
5.0 years ago by
hygine5550
hygine5550 wrote:

I have fix this problem use igvtools. The command line like this:I use the .wig filetype as the output type, .wig could use txt open, its contain A G C T N number. I think this is your answer.

Usage: igvtools count [options] [inputFile] [outputFile] [genome]

Required arguments: inputFile The input file (see supported formats above). outputFile Binary output file. Must end in ".tdf" or ".wig". To indicate that you want to output both a .tdf and a .wig file, list both output filenames as a single string, separated by a comma with no other delimiters. To display feature intensity in IGV, the density must be computed with this command, and the resulting file must be named <feature track="" filename="">.tdf.

ADD COMMENTlink written 5.0 years ago by hygine5550
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 753 users visited in the last hour