Question: TCGA: Does TCGA cancer studies have mRNA expression data for Control/Normal samples?
15
gravatar for komal.rathi
5.2 years ago by
komal.rathi3.4k
Children's Hospital of Philadelphia, Philadelphia, PA
komal.rathi3.4k wrote:

Hi everyone,

I am using the TCGA portal to get mRNA expression data for various cancer studies (e.g. lung, liver, thyroid etc). We have been on a lookout for control/normal samples for the cancer studies on TCGA. On the website we could find case/tumor samples but couldn't find any control samples. 

Does anyone know or have used control/normal samples from TCGA and can point me to it? Or do you know of a good resource (preferably using RNASeq V2 RSEM normalized expression values or z-scores) for control/normal samples in tissues like Lung, Liver, Thyroid etc. (basically all the fore-gut tissues)? 

Thanks!

rnaseq rsem tcga normals controls • 25k views
ADD COMMENTlink modified 3.1 years ago by JJ460 • written 5.2 years ago by komal.rathi3.4k
3

you can use TCGA-Assembler for that. there is a Nature Methods paper "describing it" (see ref on the link).

when you download the data using the "DownloadRNASeqData" function, you can specify if you want normal, primary tumor, recurrent tumor or metastatic. this will have you download RNASeqV1 or V2 level 3 data (RSEM normalized (or not)). you will have to transform it in z-scores youself tho. 

you can do it by following this thread in Google groups by matching the sample names (for matched samples) or taking the average of normal controls for the non matched data

ADD REPLYlink written 5.1 years ago by TriS3.9k

Thanks, what russ_hyde said worked for me, but I will definitely give this a try. Looks promising!

ADD REPLYlink written 5.1 years ago by komal.rathi3.4k

TCGA-Assembler out of service, any good alternative?

ADD REPLYlink written 3.0 years ago by arup1.7k
1

TCGA Firehose

ADD REPLYlink written 3.0 years ago by TriS3.9k
2

Hi,

SInce TCGA data are now on NCI website how can I download gene expression data (FPKM) for breast cancer and associated normal tissue. I do not find any "normal tissue" option (maybe I missed it..)

For example here's the selection for breast cance expression data :

https://gdc-portal.nci.nih.gov/search/s?filters={%22op%22:%22and%22,%22content%22:[{%22op%22:%22in%22,%22content%22:{%22field%22:%22cases.project.primary_site%22,%22value%22:[%22Breast%22]}},{%22op%22:%22in%22,%22content%22:{%22field%22:%22files.data_category%22,%22value%22:[%22Transcriptome%20Profiling%22]}},{%22op%22:%22in%22,%22content%22:{%22field%22:%22files.data_type%22,%22value%22:[%22Gene%20Expression%20Quantification%22]}},{%22op%22:%22in%22,%22content%22:{%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22HTSeq%20-%20FPKM%22]}}]}

ADD REPLYlink written 3.1 years ago by Nicolas Rosewick8.1k

Since this is a separate query, you might consider starting a new question

ADD REPLYlink written 3.1 years ago by russhh4.7k
1

There's certainly RNASeq data from matched normal samples (ie, normal lung tissue from a lung cancer patient) for the lung samples, eg TCGA-44-2655-11 here

ADD REPLYlink modified 5.2 years ago • written 5.2 years ago by russhh4.7k

So, there are a lot of TN (Tumor samples that have matched normals) compared to NT ( Normal samples that have matched tumors). How is this possible? Shouldn't the number of TN be same as NT?

ADD REPLYlink written 5.2 years ago by komal.rathi3.4k

I don't know what you mean, that';s certainly not what I thought I'd said - apologies.

There are very few control samples (ie, normal lung tissue from individuals who do not have cancer), but for around 20-25% of the lung tumour samples, there is an associated matched-normal lung sample

Hence, there are more tumour samples for which there isn't a matched-normal sample than there is tumour samples for which there is a matched normal sample

ADD REPLYlink modified 5.2 years ago • written 5.2 years ago by russhh4.7k

I meant, I referred to this & this, sample names ending in 01 are Tumor and those ending in 11 are Normal. When I went to the data matrix on TCGA for LUAD there are options like Tumor-matched & Normal-matched. Also, according to Tcga: "Tumor, Matched Normal" Vs. "Normal, Matched Tumor" 

  • TN (Tumor, matched normal) – Data for a tumor tissue for which matched normal tissue exists.

  • NT (Normal, matched tumor) – Data for normal tissue for which matched tumor tissue exists.

So I am a bit confused that shouldn't there be equal number of TN & NT when you check the data matrix? 

ADD REPLYlink modified 5.2 years ago • written 5.2 years ago by komal.rathi3.4k
1

hi, komal.rathi, if I want analysis the TCGA data talked above for a differential expression test(for paired data), whether the quantity of TN set is too small compared with the NT set for a certain cancer type? Which might lead a deviation to the result. 

Maybe it would be better, if I using the RNASeq data for the normal sample(without any cancer) as the control set for the differential analysis compared with a certain cancer? Will you give me a light where could I get the RNASeq dataset compared with TCGA?

Thanks!

ADD REPLYlink written 4.9 years ago by Miao Yu80

@komal.rathi

I need to download the RNA-Seq data, only (raw read counts for gene quantification) for Ovarian cancer patients from TCGA. I am not interested in downloading all the cases present in TCGA. I want a considerable number of patients with tumor and its match normal for which I can retrieve the RNA-Seq raw counts . I am bit confused as to what criteria of selection should I do? I have download the 489 cases of OvaCa data from TCGA having the gene expression values but there is no mention of which are for normal and which are for tumor. Can you let me know how I should do it from the portal? Correct me if ma wrong, I should first select TN RNA-Seq data for OV (color code blue), this is will give batch wise RNA-Seq V1 for tumor tissues. Now I should do the NT for finding the expression data of the samples samples of the normal for which I downloaded tumor data right? please share your idea.

ADD REPLYlink written 4.8 years ago by ivivek_ngs4.8k
1

vchris_ngs

I am assuming you have the barcodes, e.g. TCGA-09-0364-01, for each of your samples. This is the code table I referred to. The last two digits tell you if it is a tumor or normal sample. I used the TCGA Assembler to first download everything and then extracting out the matched Tumor and Normal samples. When you download from the data matrix, blue is for Matched Tumor sample and yellow is for Matched Normal sample.

But I just checked, there is no matched normal sample available for download for Ovarian serous cystadenocarcinoma in TCGA. I went to the data matrix portal, selected RNASeq and RNASeqV2 in Data Type, Level 3 in Data Level, and Tumor - matched & Normal - matched in Tumor/Normal section. It returned only Matched Tumor samples but no matched Normal samples. I guess they are not available for download yet.

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by komal.rathi3.4k

@ komal.rathi

Yes I could not find the matched normal samples as well for both RNASeq and RnASeqV2 in the data type for Level 3. It also returned only blue codes which is for matched tumor samples. So I guess it would be not possible for me to get a few patient cohort that might give me matched tumor and normal RNA-Seq data. Will it be helpful to download the clinical data from any other repositories??  Any inputs on that? I have asked a question in another RNA-Seq data for Ovarian Cancers for a pilot experiment from public database, if you would like to answer.

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by ivivek_ngs4.8k

vchris_ngs I am not aware of any other repository but I will try to find it.

ADD REPLYlink written 4.8 years ago by komal.rathi3.4k

Oh, alright! Thanks!

ADD REPLYlink written 5.2 years ago by komal.rathi3.4k

Download-->TCGA-Assembler software

Download-->TCGA-Assembler Manual: "http://www.compgenome.org/TCGA-Assembler/documents/TCGA-Assembler%20User%20Manual.pdf"

Refer to section--> "ExtractTissueSpecificSamples" on page 27.

ADD REPLYlink written 4.6 years ago by kerem.senses0
2
gravatar for JJ
3.1 years ago by
JJ460
JJ460 wrote:

Hi,

Download the clinical files e.g, here: http://firebrowse.org

If you then look at one of the merged_only_clinical file e.g., KIRC.merged_only_clinical_clin_format.txt, then look at the barcodes: https://wiki.nci.nih.gov/display/TCGA/TCGA+barcode The two digits at position 14-15 of the barcode indicates the sample type. Tumor types range from 01 - 09, normal types from 10 - 19 and control samples from 20 - 29 So 0 are tumors and 1 are normals e.g, 01 are primary tumours

Some datasets will contain normals, some only cancer samples.

EDIT: RNASeq V2 RSEM normalized expression values are available over http://firebrowse.org as well.

Best, Julia

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by JJ460

ok thanks. They should add this option in their search tool... It's a little bit a pain in the a#* ;)

ADD REPLYlink written 3.1 years ago by Nicolas Rosewick8.1k

For filenames that don't have position 14-15, is position 6-7 equivalent?

e.g. TCGA-08-0531 -> Tumor ; TCGA-12-0615 -> Control ; TCGA-26-1438 -> Normal ;

Thanks for the link to firebrowse Julia. Great resource!

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by SplitInf0

nope, that is not the same

ADD REPLYlink written 2.8 years ago by TriS3.9k

Hi Julia,

As bann13 pointed, I dont see the format that you mentioned in (KIRC.merged_only_clinical_clin_format.txt) file, instead I saw "tcga-3z-a93z" - missing the 14-15 position. I am looking for Lung cancer(LUAD) Normal and cancer patient gene expression data. I have also checked LUAD file and I found the same format "tcga-05-4244".

Help will be appreciated.

ADD REPLYlink written 2.8 years ago by umesh0

in the clinical data you won't have data (mostly) about normal or tumor, i.e. 14-15 position simply because they come from the same patient and therefore they won't add duplicate information.

ADD REPLYlink written 2.8 years ago by TriS3.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1695 users visited in the last hour