TCGA data looks different on ICGC
6.4 years ago

I'm interested in using TCGA data to look at the frequency of somatic mutations in genes for a particular cancer (Ovarian). Because my scripting skills are "OK" at best and I've struggled to get what I want out of .maf files, I followed advice given in this post (A: Retreiving Data From Tcga Database) and decided to get my data from ICGC's website instead of TCGA. 

I'm confused about the completeness of the TCGA data available through ICGC. On ICGC's website (, there are apparently 582 participants of TCGA's Ovarian cohort and there are simple somatic mutation data available for only 88 of these participants. Contrast that with TCGA's ( which has apparently 586 participants and... I'm not sure how many participants for which there is somatic mutation data.

Are there really only 88 participants out of over 500 that show somatic mutations? 

6.4 years ago
Ying W ★ 4.0k

There are 582 donor tissues of which 82 have donors have somatic mutation data (both normal + tumor DNA-seq is available and a somatic mutation caller like varscan was used to identify mutations). It is expensive and takes a lot of tissue to do 2x DNA-seq at like 40-80x coverage. More samples will be available from ICGC once data train 2.0 is released (looks like maybe another 50 samples or so).

I'm not sure if TCGA has WGS data for OV-US


