I was trying to get Lung cancer samples from TCGA (LUSC & LUAD). I wanted to categorize the data depending upon tumor stage. I have used the following command to download clinical file
library("TCGAbiolinks") clinical_lusc <- GDCquery_clinic(project = "TCGA-LUSC", type = "clinical")
From GDC-Firehose I was looking for data categorization and they have set of barcodes which they have categorized as "Solid Tissue Normal". For example, one Sample "TCGA-98-8020" is being listed as a Normal sample in gdc-Firehose and the same sample in the Clinical file (got from the TCGAbiolink command) shows as cancerous i.e. the tumor_stage is shown as "stage iiia". As per my understanding, 3rd stage indicates larger cancers or tumors that have grown more deeply into nearby tissue. They may have also spread to lymph nodes but not to other parts of the body. Now I am confused how to interpret this, Is my cancer classification of TCGA samples into Cancerous Vs Normal is correct or I am making something wrong here or both These sites are hosting different data? I am totally confused, Please help.
Thanks a lot, Sincerely, Dave.