Entering edit mode
6.5 years ago
skr2345+bio
▴
10
In order to calculate the mutation rate per Mb for TCGA SKCM dataset, I am looking for the exact values of target area coverage of each sample/patient, which is called as "#BasepairsCovered" in the supplementary table of Nature paper "Mutational landscape and significance across 12 major cancer types" published in 2013. Here is the URL for the supplementary data (Table S_3a):
https://images.nature.com/original/nature-assets/nature/journal/v502/n7471/extref/nature12634-s1.zip
Is it possible that I could find these values in the correspondent BAM files or in their XML files?
I have not come across '#BasepairsCovered' in relation to the TCGA in the past. Do they define it in their methods or in the table legend?
If it's literally the number of reference genome bases that have reached a specific level of specified coverage, then you can infer this from the BAMs. Here is some code that I wrote to do this: Compute mean depth coverage for exome data with paired end, overlapping, features