TCGA metastasis ID
Entering edit mode
12 months ago
sabin ▴ 40

Hello everyone,

I am downloading data from TCGA - PAAD and in particular I'm interested in primary (pancreatic cancer) vs its metastasis. I downloaded the metadata for PAAD and i want the sampleID of the primary tissue as well as the sample ID of the metastasis. For instance, primary pancreatic tumor has sampleID = XXX and metastasized in liver. Such liver metastasis sample has sampleID=YYY. How do I get to know this YYY? From the metadata that i downloaded I know the sampleID for the primary tumor and the site where it metastasized, but can I get the ID for this metastasis and so then get the data too?

(is not the sample_type=metastatic because in that case I guess it's saying that the pancreatic tumor sample is a metastasis)

Thank you in advance for the help.

tcga metastasis data primary • 545 views
Entering edit mode
12 months ago
Hamid Ghaedi ★ 1.8k

Please show a snapshot of what you have to get more specific comments/answers. Suppose you are interested in expression data and already you have downloaded the expression matrice. If you are using a package like TCGAbiolinks for data download from GDC, simultaneously it would download the clinical data at the same time. In that table, you can find a column under the name ShortLetterCode. In this column, TM is a metastatic one.

If all you have is an experiment data table(expression, methylation, ....) , in the column name you have something like this:

enter image description here

Source: GDC documentation.

to find type of sample try :

table(substr(colnames(rna),14,15)) # I supposed you have an expression matric , rna

This will return codes by which you can identify how many of what sample type you have. 01 is the primary tumor sample, 11 is for normal adjacent tissue. Other codes could be found here.

Entering edit mode

Hi Hamid, thanks a lot for your help.

I download the "phenotype" file from TCGA for the project PAAD-pancreatic tumor.

I put an example below with few rows of such file: example data table.

in _PATIENT there are the samples of pancreatic tumor. For these patients I want to know if they had metastases, so i go and check the second columns.

Then I want to know the site of the metastasis and so i consider the third column. For the same project I downloaded RNAseq data. Now I want to get data for the sample TCGA-2J-AABO-01 (that is pancreatic tumor sample since in the column sample_type is Primary Tumor). So i go to the RNAseq matrix and take this column.

Now I also want to get the data fthe correspondent metastasis in that patient, so i want the RNAseq data of that Liver metastasis. However, to get it, i need to know the sampleID of the metastasis. How can i get such ID and then the Rnaseq data?

I hope now my problem is more clear!

Entering edit mode

Those samples that have 06 and 07 as the last two letters in their sampleID should be metastatic ones. Here I used TCGAbiolinks package for TCGA data download :

query <- GDCquery(project = "TCGA-PAAD",
                  data.category = "Gene expression",
                  data.type = "Gene expression quantification",
                  platform = "Illumina HiSeq", 
                  file.type  = "normalized_results",
                  experimental.strategy = "RNA-Seq",
                  legacy = TRUE)

GDCdownload(query, method = "api")
dat <- GDCprepare(query = query)
rna <-
# 01  06  11 
#178   1   4

As you can see, there is only one metastatic sample (06) with expression data available.

Entering edit mode

Hi Hamid,

That only metastasic sample it should be the pancreatic tumor samples that is classified as metastasis. Indeed, under the column sample_type there is only one "metastasic". However, what i need is that Liver sample classified as "Distant Metastasis" of the pancreatic primary tumor. There should be 46 metastasis developed from PAAD as a primary.

Is there a way to get these info?


Login before adding your answer.

Traffic: 1427 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6