Question: TCGA - finding Normal and Tumor samples for matching protein and rna-seq samples
0
gravatar for sid5427
21 months ago by
sid542710
USA
sid542710 wrote:

Good day all,

So I have been trying to retrieve RNA-seq, miRNA-seq and protein expression data from TCGA' GDC data portal. Unfortunately, I have been having some issues. I have read a number of previous posts here in biostars and other forums, but still having issues using the new GDC portal

I have figured out how to get the Primary Tumor and Solid normal tissue RNA-seq and miRNA-seq datasets I need i.e. TCGA barcode ids with 01 or 11 code respectively - as defined by https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/sample-type-codes

For example TCGA-COAD project has 456 RNA-seq files and 444 miRNA-seq files

However, I am unable to find the corresponding protein expression datasets - especially Solid normal tissue datasets(11 code or any other). I have looked at TCPA - https://tcpaportal.org/tcpa/ and only found Primary tumor protein expression (01 code) for the TCGA projects I need. I have also looked through the legacy GDC portal but was unable to find any way to match the files names to the TCGA barcodes. In the new GDC portal I can put files in my cart and download the "sample sheet" which gives me the mapping of actual filenames, MD5 hashcodes, TCGA barcodes, etc. I honestly was under the impression that GDC would have the protein expression data as listed in this link - https://cancergenome.nih.gov/abouttcga/aboutdata/datalevelstypes

Am I missing something here? are my assumptions wrong on that Solid normal tissue datasets(11 code) are included? Amy help would be greatly appreciated.

rna-seq tcga • 2.1k views
ADD COMMENTlink modified 19 months ago by BrunoGiotti110 • written 21 months ago by sid542710
3
gravatar for Kevin Blighe
21 months ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

Edit April 5th, 2020:

Since posting this answer, the NIH has set up the Proteomic Data Commons

-------------

As far as I am aware, protein expression profiling was not a primary goal of TCGA. So, some cohorts may not even have this type of data. Thus, if you cannot find any normals in your data of interest, that may be because no normals were ever profiled.

I would also take a look a cBioPortal, which has RPPA (reverse phase protein array) data for different datasets, including colon, I see.

Kevin

ADD COMMENTlink modified 4 months ago • written 21 months ago by Kevin Blighe63k
1

Thanks Kevin,

I found the RPPA cbioportal datasets and was able to match their TCGA ids to their respective RNA-seq and miRNAseq data. However as you mentioned, since protein expression profiling was not the primary goal, and specifically referring back to literature for the tcga, it seems "normal" protein expression datasets were not generated - at least not for TCGA project datasets I am looking at - namely COAD and READ.

ADD REPLYlink modified 21 months ago • written 21 months ago by sid542710
1
gravatar for BrunoGiotti
19 months ago by
BrunoGiotti110
New York, NY USA
BrunoGiotti110 wrote:

Hey, did you check the CPTAC data? It's a consortium which have been profiling proteome and phosphoproteome of TCGA ovarian breast and colonrectal cancers, and they are continuing to do so for further TCGA studies. Here is the link for the colon-rectal study: https://cptac-data-portal.georgetown.edu/cptac/s/S016

Cheers

B

ADD COMMENTlink written 19 months ago by BrunoGiotti110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1372 users visited in the last hour