TCGA data: where to find the target area/basepairs coverage per sample
0
1
Entering edit mode
6.5 years ago
skr2345+bio ▴ 10

In order to calculate the mutation rate per Mb for TCGA SKCM dataset, I am looking for the exact values of target area coverage of each sample/patient, which is called as "#BasepairsCovered" in the supplementary table of Nature paper "Mutational landscape and significance across 12 major cancer types" published in 2013. Here is the URL for the supplementary data (Table S_3a):

https://images.nature.com/original/nature-assets/nature/journal/v502/n7471/extref/nature12634-s1.zip

Is it possible that I could find these values in the correspondent BAM files or in their XML files?

sequencing next-gen • 1.9k views
ADD COMMENT
0
Entering edit mode

I have not come across '#BasepairsCovered' in relation to the TCGA in the past. Do they define it in their methods or in the table legend?

If it's literally the number of reference genome bases that have reached a specific level of specified coverage, then you can infer this from the BAMs. Here is some code that I wrote to do this: Compute mean depth coverage for exome data with paired end, overlapping, features

ADD REPLY

Login before adding your answer.

Traffic: 2115 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6