Question: TCGA data set with both expression (rna seq/ microarray) and exome seq of same samples?
4.1 years ago
United States
cafelumiere12 wrote:

I have tried downloading some TCGA RNA seq data before from TCGA site, also trying TCGA-Assembler download route. However I don't remember being able to do this:

Does anyone have any idea/ suggestion regarding how to find data sets of a particular cancer that have both some sort of expression results (RNA-seq or microarray) and exome sequencing results?


I was able to use and found the data I needed.

4.1 years ago
Sean Davis
National Institutes of Health, Bethesda, MD
Sean Davis wrote:

TCGA has both gene expression and exome/genome sequence data for nearly all samples. To get access to the actual sequencing data, you will need to apply for access, as sequencing data for human subjects is almost always controlled access. See here for instructions:

Thank you! After poking around I found

However, while I tried to download the data from the above link, for example, through COADREAD Archives- After downloading MAF files and mRNAseq files -

I found overlapping samples (based on project, TSS, participant ID of TCGA barcodes), however there are only 74 overlapping samples between mRNAseq data and Mutation Annotation files. While when I tried using R package cgdsr_1.2.5 for querying data from CBio portal as well and found that in the COADREAD datasets there should be at least 195 cases in one of the studies (Colorectal Adenocarcinoma (TCGA, Nature 2012)) with complete data (mutation, mRNA,etc) . The only problem I have with using cgdsr to query CBio portal is that there isn't a way to do bulk download, I need to specify specific genes. Not sure why I am getting fewer overlapping cases through GDAC website though

4.1 years ago
New Zealand
nwon wrote:

All TCGA data has migrated to Genomic Data Commons Link to Genomic Data Commons

Within this web resource is the legacy TCGA data within the legacy database.

4.1 years ago
pel wrote:

You can find the largest selection of level2 and level3 (no human subjects protocol required) data for somatic mutations, CNVs, SNPs, methylation, and RNA-Seq and chip-based expression for each tumor in TCGA for multiple cancer sites at the PanCancer 12 site

Recall, as was pointed out above, you cannot get the sequence data without approval, however, the mutation (.maf) files are level2 and have most of the mutation calls.

