Gene Coverage In Music
1
2
Entering edit mode
11.6 years ago
yliubu ▴ 30

Hi, all,

I am interested in MuSiC. When I use CalcWigCovg function, there is a directory called gene_covg, which contain per-gene covered base counts for each sample. What is the relationship between gene length and gene covered bases? Usually gene length is longer than covered bases, does that mean for most of the bases within this gene, they cannot meet the coverage requirement, and what is the number of coverage requirement? Why this number is different between samples, does that mean during sequencing, some bases in a specific sample cannot be sequenced well? I read from some papers that, when people calculate background mutation rate, they usually divided the mutated bases by the total bases. (eg, Somatic mutations affect key pathways in lung adenocarcinoma, nature, 2008). Is the "bases" here the same with "coverage" in MuSiC? Also, in the output of total_covgs, there are overall non-overlapping coverages per sample. If I provied, say 5 genes (5 roi) in roi file, does that mean the sum of the 5 gene_covg in each sample is equal to the total_covg of this sample in total_covgs file? But results seems not. Looking forward to your explaination and appreciated for that.

music • 2.9k views
ADD COMMENT
1
Entering edit mode
11.6 years ago

Please read the documentation of CalcWigCovg available online. Let me copy over the part that answers most of your questions:

This script counts bases with sufficient coverage in the ROIs of each gene from given wiggle track format files, and categorizes them into - AT, CG (non-CpG), and CpG counts. It also adds up these base-counts across all ROIs of each gene for each sample, but covered bases that lie within overlapping ROIs are not counted more than once towards these total counts.

If you are using wiggle files from Broad Institute's Firehose, their thresholds for "sufficient coverage" are 8x reads in normal, and 14x in the tumor. I presume they require a minimum base quality of 20. If you have access to BAM files, then you can use MuSiC's calc-covg and customize these thresholds.

If you have 5 non-overlapping regions in your roi-file, then the "total_covgs" file will list the sum of counts from each region. But genes are not that simple. A gene can have multiple overlapping exons from different isoforms and different reading frames. You can either merge these together into contiguous non-overlapping exonic loci, or you can allow exons to represent the different overlapping reading frames that a variant can be annotated to. The various regions that constitute a "gene" is left up to the user, but the number of bps in the "total_covgs" file is always the non-overlapping total bps per sample i.e. the same loci are never counted twice towards the total. But for per-gene coverages, the per-ROI coverages are just summed up regardless of any overlapping loci.

ADD COMMENT
0
Entering edit mode

Thanks for the reply. But I am still confused. Why the coverage of each gene is much smaller than the gene length, does that mean most of the bases can not be sequenced successfully? The reason why the gene coverage is different among samples is also due to the same reason?

ADD REPLY
0
Entering edit mode

And also, why do you consider cover-bases instead of the general total_bases for the denominator of BMR? How do this number affect the calculation of BMR? Thanks.

ADD REPLY
0
Entering edit mode

Yes, sequencing isn't perfect, particularly in older exome-capture projects like TCGA GBM (you indicated that you were working on this, in another thread). If you don't have sufficient read-depth at a locus to call a variant, we don't add it to the denominator when measuring mutation rate. The thresholds for "sufficient" are described in my answer above. I hope this makes it clear.

ADD REPLY
0
Entering edit mode

Thanks a lot for your detailed explanation! That is very helpful!

ADD REPLY

Login before adding your answer.

Traffic: 1437 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6