Confusion regarding selection of TCGA maf files for analysis
0
0
Entering edit mode
9.1 years ago
monukmr98 ▴ 80

Hi

I am struggling with selection of maf files for mutation analyses on TCGA patients. Here is the actual problem.

Say I want to do somatic mutation analysis for GBM patients. So there are two types I can get the data from

(i) From TCGA-DATA MATRIX (https://tcga-data.nci.nih.gov/tcga/dataAccessMatrix.htm), I selected 'somatic mutations' in "Data Type" and clicking apply I followed the email link to download the data. By untarring the file I got Level_2 maf file named "broad.mit.edu__Illumina_Genome_Analyzer_DNA_Sequencing_level2.maf" having 21495 lines (first line being the header)

(ii) From broad institute 's "MAF+Dashboard" facility (https://confluence.broadinstitute.org/display/GDAC/MAF+Dashboard) I got two files for GBM tissue under the main section (MAFs Available from the DCC as of 26 February 2015) of this above said web link page.

These two files are:

(a) gbm_liftover.aggregated.capture.tcga.uuid.somatic.maf (22169 lines)

(b) step4_gbm_liftover.aggregated.capture.tcga.uuid.maf2.4.migrated.somatic.maf (22171 lines)

Now my confusion is, which file to proceed with for mutational analysis in GBM.

Thanks in advance

maf data-matrix TCGA dashboard • 4.4k views
ADD COMMENT
0
Entering edit mode

try here. Generally all versions are available, including the obsolete ones. Look for deploy date to select recent one.

ADD REPLY
0
Entering edit mode

Thanks poisonAlien

I downloaded the recent file in GBM (i.e. 27-JUN-13; last link in GBM section) and found its identical to the (ii) (b) file mentioned in my query. So can I proceed with this

Other query is:

In MAF+Dashboard facility (https://confluence.broadinstitute.org/display/GDAC/MAF+Dashboard), they have divided web page into two sections

A) MAFs Ingested into Broad GDAC Firehose as of 05 February 2015

B) MAFs Available from the DCC as of 26 February 2015

So what's the meaning of section A) in that page? because in section A) gbm_liftover.aggregated.capture.tcga.uuid.somatic.maf.txt is enlisted while what we have selected for recent deployment is step4_gbm_liftover.aggregated.capture.tcga.uuid.maf2.4.migrated.somatic.maf

Thanks

ADD REPLY
1
Entering edit mode

Both files you have mentioned are same, with same mutations. Only difference is in the way they are annotated. First one is annotated using (gbm_liftover*.maf) Oncotator v0.5.25.0 whereas the second one (step4_gbm_*.maf) is using Oncotator v1.0.0.0rc20. Please look at the comment lines at the beginning of the file (begining with #). Except this both files have same mutations (22167). This is the difference between both versions of oncotator:

## Oncotator v1.0.0.0rc20| Gaf 3.0 | UniProt_AAxform 2011_09 | COSMIC v62_291112 | dbSNP build 134 | Flat File Reference hg19 | CCLE_By_GP 09292010 | ORegAnno UCSC Track | UniProt_AA 2011_09 | ACHILLES_Lineage_Results 110303 | CGC full_2012-03-15 | CCLE_By_Gene 09292010 | COSMIC_FusionGenes v62_291112 | COSMIC_Tissue 291112 | HumanDNARepairGenes 20110905 | Familial_Cancer_Genes 20110905 | MutSig Published Results 20110905 | RefSeq Feb052012 | UniProt 2011_09 | TCGAScape 110405 | TUMORScape 20100104

Older version:

## Oncotator v0.5.25.0|GAF 2.1 hg19 Jun2011|dbSNP build 134|UniProt Release 2011_09|COSMIC v55| No Poly-Phen2 used|Tumorscape 20100104|TCGAscape 20110405|Achilles Lineage Results|CCLE Jan2011_freeze|CCLE Oncomap|Cancer Gene Consensus 20110322|Familial Cancer Database 20110905|HumanDNARepairGenes (Wood et al.) 20110905|MutSig Published Results 20110905|ORegAnno UCSC Track

See how the some of the annotation columns change (like cosmic version used in both case). More information of oncotator here.

ADD REPLY
0
Entering edit mode

If I come just down to that page (https://wiki.nci.nih.gov/display/TCGA/TCGA+MAF+Files#TCGAMAFFiles-GBM:Glioblastomamultiforme) under the HNSC tissue

HNSC: Head and Neck squamous cell carcinoma

I get total 3 files for HNSC

and latest deploy date is for following file having 279:279 (tumor:normal) samples

pair_set_279_freeze_Mar262013.aggregated.capture.tcga.uuid.curated.somatic.maf

if I select this file based on deploy date as mentioned by you earlier, then am I missing any information in terms of samples as 2nd file ("PR_TCGA_HNSC_PAIR_Capture_All_Pairs_QCPASS_v4.aggregated.capture.tcga.uuid.automated.somatic.maf") contains 509:567 samples; but latters deploy date is 26-MAR-14 .

So how to go about it.

Thanks

ADD REPLY
0
Entering edit mode

hi, monukmr98 Because tcga dcc came to an end, i couldn't access tcga data from broad institute 's "MAF+Dashboard" facility. files in this section So how to do except GDCGDC_TCGA Thanks in advance.

ADD REPLY

Login before adding your answer.

Traffic: 2409 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6