Mutation processing by cbioportal
1
1
Entering edit mode
7.0 years ago
Arnaud Ceol ▴ 860

Hi,

I got lost with the mutations from TCGA made available from cbioportal (http://www.cbioportal.org/) and from the gdc portal (http://gdc-portal.nci.nih.gov/), and I've the feeling I'm missing something. My problem is that I found different number of mutations, although the source should be the same.

An example: if I look for TP53 mutations in ovarian cancer.

  • from cbioportal, I select Ovarian Serous Cystadenocarcinoma (TCGA, Nature 2011), mutations only and TP53: I obtain ~300 mutations ( I downloaded the results and checked that this is indeed mutations in ~300 samples).
  • from the GDC portal, I download the MAF file for the same ovarian cancer (TCGA.OV.mutect.9579c7c5-e170-4674-97ab-5dbfe73f78d3.somatic.maf.gz, I also looked at other tools than mutect). I filter it for TP53 and obtain ~70 mutations.

I imagine that the processing of the mutations in cbioportal is different, but I didn't found a lot of documentation. Has anyone some clue about it?

thanks,

Arnaud

cbioportab maf gdc tcga • 2.9k views
ADD COMMENT
0
Entering edit mode
7.0 years ago
igor 13k

There are differences in variant reporting between the old TCGA Data Portal and the new GDC. I think there was a discussion here a few weeks ago about the differences, but I can't find it now.

I believe cBioPortal is still using the old TCGA data and applying their own filters:

The TCGA provisional datasets are directly from TCGA data center partly via Broad Firehose which are updated regularly.

We are also actively curating datasets from literature. Studies from literature were curated from the data published with the papers. We sometimes reach out to the investigators to additional data such as clinical attributes. All the mutation data (VCF or MAF) were processed through an internal pipeline to annotate the variant effects in a consistent way across studies.

Source: http://www.cbioportal.org/faq.jsp

ADD COMMENT
0
Entering edit mode

Thanks, I found the post you mention Anyone knows mutation pipeline for cbioportal? . My question is therefore some king of duplicate. I had also seen the doc in the cbioportal's faq, but unfortunately it doesn't say a lot about how the data is processed.

My guess is that cbioportal shows all mutations identified, without applying a particular tool for extracting driver mutations only.

ADD REPLY

Login before adding your answer.

Traffic: 2335 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6