Question: Gene Coverage In Music
gravatar for yliubu
8.3 years ago by
yliubu30 wrote:

Hi, all,

I am interested in MuSiC. When I use CalcWigCovg function, there is a directory called gene_covg, which contain per-gene covered base counts for each sample. What is the relationship between gene length and gene covered bases? Usually gene length is longer than covered bases, does that mean for most of the bases within this gene, they cannot meet the coverage requirement, and what is the number of coverage requirement? Why this number is different between samples, does that mean during sequencing, some bases in a specific sample cannot be sequenced well? I read from some papers that, when people calculate background mutation rate, they usually divided the mutated bases by the total bases. (eg, Somatic mutations affect key pathways in lung adenocarcinoma, nature, 2008). Is the "bases" here the same with "coverage" in MuSiC? Also, in the output of total_covgs, there are overall non-overlapping coverages per sample. If I provied, say 5 genes (5 roi) in roi file, does that mean the sum of the 5 gene_covg in each sample is equal to the total_covg of this sample in total_covgs file? But results seems not. Looking forward to your explaination and appreciated for that.

music • 2.3k views
ADD COMMENTlink modified 8.3 years ago by Cyriac Kandoth5.5k • written 8.3 years ago by yliubu30
gravatar for Cyriac Kandoth
8.3 years ago by
Cyriac Kandoth5.5k
Memorial Sloan Kettering, New York, USA
Cyriac Kandoth5.5k wrote:

Please read the documentation of CalcWigCovg available online. Let me copy over the part that answers most of your questions:

This script counts bases with sufficient coverage in the ROIs of each gene from given wiggle track format files, and categorizes them into - AT, CG (non-CpG), and CpG counts. It also adds up these base-counts across all ROIs of each gene for each sample, but covered bases that lie within overlapping ROIs are not counted more than once towards these total counts.

If you are using wiggle files from Broad Institute's Firehose, their thresholds for "sufficient coverage" are 8x reads in normal, and 14x in the tumor. I presume they require a minimum base quality of 20. If you have access to BAM files, then you can use MuSiC's calc-covg and customize these thresholds.

If you have 5 non-overlapping regions in your roi-file, then the "total_covgs" file will list the sum of counts from each region. But genes are not that simple. A gene can have multiple overlapping exons from different isoforms and different reading frames. You can either merge these together into contiguous non-overlapping exonic loci, or you can allow exons to represent the different overlapping reading frames that a variant can be annotated to. The various regions that constitute a "gene" is left up to the user, but the number of bps in the "total_covgs" file is always the non-overlapping total bps per sample i.e. the same loci are never counted twice towards the total. But for per-gene coverages, the per-ROI coverages are just summed up regardless of any overlapping loci.

ADD COMMENTlink modified 8.3 years ago • written 8.3 years ago by Cyriac Kandoth5.5k

Thanks for the reply. But I am still confused. Why the coverage of each gene is much smaller than the gene length, does that mean most of the bases can not be sequenced successfully? The reason why the gene coverage is different among samples is also due to the same reason?

ADD REPLYlink written 8.3 years ago by yliubu30

And also, why do you consider cover-bases instead of the general total_bases for the denominator of BMR? How do this number affect the calculation of BMR? Thanks.

ADD REPLYlink written 8.3 years ago by yliubu30

Yes, sequencing isn't perfect, particularly in older exome-capture projects like TCGA GBM (you indicated that you were working on this, in another thread). If you don't have sufficient read-depth at a locus to call a variant, we don't add it to the denominator when measuring mutation rate. The thresholds for "sufficient" are described in my answer above. I hope this makes it clear.

ADD REPLYlink written 8.3 years ago by Cyriac Kandoth5.5k

Thanks a lot for your detailed explanation! That is very helpful!

ADD REPLYlink written 8.3 years ago by yliubu30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2669 users visited in the last hour