I am downloading data from TCGA - PAAD and in particular I'm interested in primary (pancreatic cancer) vs its metastasis. I downloaded the metadata for PAAD and i want the sampleID of the primary tissue as well as the sample ID of the metastasis. For instance, primary pancreatic tumor has sampleID = XXX and metastasized in liver. Such liver metastasis sample has sampleID=YYY. How do I get to know this YYY? From the metadata that i downloaded I know the sampleID for the primary tumor and the site where it metastasized, but can I get the ID for this metastasis and so then get the data too?
(is not the sample_type=metastatic because in that case I guess it's saying that the pancreatic tumor sample is a metastasis)
Please show a snapshot of what you have to get more specific comments/answers. Suppose you are interested in expression data and already you have downloaded the expression matrice. If you are using a package like TCGAbiolinks for data download from GDC, simultaneously it would download the clinical data at the same time. In that table, you can find a column under the name ShortLetterCode. In this column, TM is a metastatic one.
If all you have is an experiment data table(expression, methylation, ....) , in the column name you have something like this:
Source: GDC documentation.
to find type of sample try :
table(substr(colnames(rna),14,15)) # I supposed you have an expression matric , rna
This will return codes by which you can identify how many of what sample type you have. 01 is the primary tumor sample, 11 is for normal adjacent tissue. Other codes could be found here.