Question

Mutation processing by cbioportal

1

Entering edit mode

7.0 years ago

Arnaud Ceol ▴ 860

Hi,

I got lost with the mutations from TCGA made available from cbioportal (http://www.cbioportal.org/) and from the gdc portal (http://gdc-portal.nci.nih.gov/), and I've the feeling I'm missing something. My problem is that I found different number of mutations, although the source should be the same.

An example: if I look for TP53 mutations in ovarian cancer.

from cbioportal, I select Ovarian Serous Cystadenocarcinoma (TCGA, Nature 2011), mutations only and TP53: I obtain ~300 mutations ( I downloaded the results and checked that this is indeed mutations in ~300 samples).
from the GDC portal, I download the MAF file for the same ovarian cancer (TCGA.OV.mutect.9579c7c5-e170-4674-97ab-5dbfe73f78d3.somatic.maf.gz, I also looked at other tools than mutect). I filter it for TP53 and obtain ~70 mutations.

I imagine that the processing of the mutations in cbioportal is different, but I didn't found a lot of documentation. Has anyone some clue about it?

thanks,

Arnaud

cbioportab maf gdc tcga • 2.9k views

ADD COMMENT • link updated 7.0 years ago by igor 13k • written 7.0 years ago by Arnaud Ceol ▴ 860

score 0 · Answer 1 · 2017-04-24

There are differences in variant reporting between the old TCGA Data Portal and the new GDC. I think there was a discussion here a few weeks ago about the differences, but I can't find it now.

I believe cBioPortal is still using the old TCGA data and applying their own filters:

The TCGA provisional datasets are directly from TCGA data center partly via Broad Firehose which are updated regularly.

We are also actively curating datasets from literature. Studies from literature were curated from the data published with the papers. We sometimes reach out to the investigators to additional data such as clinical attributes. All the mutation data (VCF or MAF) were processed through an internal pipeline to annotate the variant effects in a consistent way across studies.

Source: http://www.cbioportal.org/faq.jsp