I'm trying to use the python API from https://portal.gdc.cancer.gov/ to select samples with DNA methylation data and/or RNA-seq data from tumour and adjacent normal tissue. I can't seem to find a metadata tag that I can use to filter the samples based on the criterion of the data being from tumour or adjacent normal tissue.
The data model does seem to represent quite a lot of things about the samples but I can't find anything about this criterion in the list of available fields, see: https://docs.gdc.cancer.gov/API/Users_Guide/Appendix_A_Available_Fields/ . I may have missed it, (it's a fairly long list some of the items on which are a bit open to interpretation, and there is no longer form disambiguation of their meaning in evidence), but scanning through and searching for likely keywords didn't yield anything unambiguous.
As mentioned in this post TCGA: Does TCGA cancer studies have mRNA expression data for Control/Normal samples? '01' / '11' in the filenames of samples can be used to differentiate between tumour and adjacent normal tissue data and this looks to still be the case take this random example. However, using this would require a rather inefficient and possibly error prone process of trying to parse the filenames and filter on that.
Does anyone here know if tumour / adjacent normal status of files in represented in the GDC data portal's data model in a way that can be easily filtered?