Question: Data Discrepency from TCGA-GDC portal and GDC firehose
0
gravatar for David_emir
8 months ago by
David_emir300
India
David_emir300 wrote:

Hello All,

I was trying to get Lung cancer samples from TCGA (LUSC & LUAD). I wanted to categorize the data depending upon tumor stage. I have used the following command to download clinical file

library("TCGAbiolinks")

clinical_lusc <- GDCquery_clinic(project = "TCGA-LUSC", type = "clinical")

From GDC-Firehose I was looking for data categorization and they have set of barcodes which they have categorized as "Solid Tissue Normal". For example, one Sample "TCGA-98-8020" is being listed as a Normal sample in gdc-Firehose and the same sample in the Clinical file (got from the TCGAbiolink command) shows as cancerous i.e. the tumor_stage is shown as "stage iiia". As per my understanding, 3rd stage indicates larger cancers or tumors that have grown more deeply into nearby tissue. They may have also spread to lymph nodes but not to other parts of the body. Now I am confused how to interpret this, Is my cancer classification of TCGA samples into Cancerous Vs Normal is correct or I am making something wrong here or both These sites are hosting different data? I am totally confused, Please help.

Thanks a lot, Sincerely, Dave.

gdc tcga firehose • 513 views
ADD COMMENTlink modified 8 months ago by mbk0asis390 • written 8 months ago by David_emir300

What data are you trying to get? Gene expression, clinical?

ADD REPLYlink written 8 months ago by Sean Davis25k

I was planning to Get Differential Gene Expression analysis done on raw count data (RNAseq data) for that i need to group the samples into various levels.

ADD REPLYlink written 8 months ago by David_emir300
0
gravatar for mbk0asis
8 months ago by
mbk0asis390
Korea, Republic Of
mbk0asis390 wrote:

"TCGA-98-8020" is the ID for participants.

One participant may have both cancer and normal samples.

Two digit code after the participant ID indicates if the sample is cancer or normal.

e.g.)

"TCGA-98-8020-01X-XXX-XXXX-XX" (Cancer)

"TCGA-98-8020-11X-XXX-XXXX-XX" (Normal)

You may google "TCGA barcode" for more information.

Check out the full sample ID to segregate normals from cancer samples.

ADD COMMENTlink written 8 months ago by mbk0asis390

Thanks a lot or your help, Could you please let me know how to download the full ids?

ADD REPLYlink written 8 months ago by David_emir300

You are using TCGAbiolinks already, aren't you then also retrieving the full ids for the clinical data?

This should help you:

I assume you're starting with:

query <- GDCquery("yourquery")

Then just type:

getResults(query, cols = c("cases", "tissue.definition"))
ADD REPLYlink modified 8 months ago • written 8 months ago by mathias.heydt80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1074 users visited in the last hour