Question: Mutation processing by cbioportal
gravatar for Arnaud Ceol
2.8 years ago by
Arnaud Ceol850
Milan, Italy
Arnaud Ceol850 wrote:


I got lost with the mutations from TCGA made available from cbioportal ( and from the gdc portal (, and I've the feeling I'm missing something. My problem is that I found different number of mutations, although the source should be the same.

An example: if I look for TP53 mutations in ovarian cancer.

  • from cbioportal, I select Ovarian Serous Cystadenocarcinoma (TCGA, Nature 2011), mutations only and TP53: I obtain ~300 mutations ( I downloaded the results and checked that this is indeed mutations in ~300 samples).
  • from the GDC portal, I download the MAF file for the same ovarian cancer (TCGA.OV.mutect.9579c7c5-e170-4674-97ab-5dbfe73f78d3.somatic.maf.gz, I also looked at other tools than mutect). I filter it for TP53 and obtain ~70 mutations.

I imagine that the processing of the mutations in cbioportal is different, but I didn't found a lot of documentation. Has anyone some clue about it?



gdc tcga maf cbioportab • 1.5k views
ADD COMMENTlink modified 2.8 years ago by igor9.5k • written 2.8 years ago by Arnaud Ceol850
gravatar for igor
2.8 years ago by
United States
igor9.5k wrote:

There are differences in variant reporting between the old TCGA Data Portal and the new GDC. I think there was a discussion here a few weeks ago about the differences, but I can't find it now.

I believe cBioPortal is still using the old TCGA data and applying their own filters:

The TCGA provisional datasets are directly from TCGA data center partly via Broad Firehose which are updated regularly.

We are also actively curating datasets from literature. Studies from literature were curated from the data published with the papers. We sometimes reach out to the investigators to additional data such as clinical attributes. All the mutation data (VCF or MAF) were processed through an internal pipeline to annotate the variant effects in a consistent way across studies.


ADD COMMENTlink written 2.8 years ago by igor9.5k

Thanks, I found the post you mention Anyone knows mutation pipeline for cbioportal? . My question is therefore some king of duplicate. I had also seen the doc in the cbioportal's faq, but unfortunately it doesn't say a lot about how the data is processed.

My guess is that cbioportal shows all mutations identified, without applying a particular tool for extracting driver mutations only.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Arnaud Ceol850
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1748 users visited in the last hour