MutSigCV instructions indicate that the program can be run without a coverage file, by using the supplied exome_full192.coverage.txt file. Wouldn't it be unwise to use MutSigCV this way, as without accurate coverage information from the actual experiment being analyzed, it can't do its job of estimating mutation rate accurately? And to construct a coverage file, it is necessary to understand the file's columns, but the meaning of the "categ" column in the context of coverage is confusing.
The coverage file has columns for gene, effect (silent, non-silent, non-coding), and categ. The categ column contains mutation categories, such as "C(A->G)T". Lastly the coverage file has a column for each patient, giving the number of bases that are sequenced to an adequate coverage in that patient, for that gene with that effect and that mutation category.
My first question is, since there are no categories for non-mutations, does the coverage file only contain information for the bases at the mutations listed in the mutations file? And the mutations file does not contain mutations at bases that were not sequenced to adequate depth, right? So the coverage file only has coverage info for the mutated bases that were sequenced to adequate depth?
Second question. I had expected the coverage file to have information about how well each gene is covered by sequencing. For example, if only half of a gene is sequenced, this changes the analysis of how many mutations in the gene constitute significant recurrence. Yet the MutSigCV coverage file has no information about bases that are not mutated, since there is no categ value for a non-mutation. Can someone explain this?