GDC data portal - Tumour and Adjacent normal
1
0
Entering edit mode
5.6 years ago
rjactonspsfcf ▴ 160

Hi,

I'm trying to use the python API from https://portal.gdc.cancer.gov/ to select samples with DNA methylation data and/or RNA-seq data from tumour and adjacent normal tissue. I can't seem to find a metadata tag that I can use to filter the samples based on the criterion of the data being from tumour or adjacent normal tissue.

The data model does seem to represent quite a lot of things about the samples but I can't find anything about this criterion in the list of available fields, see: https://docs.gdc.cancer.gov/API/Users_Guide/Appendix_A_Available_Fields/ . I may have missed it, (it's a fairly long list some of the items on which are a bit open to interpretation, and there is no longer form disambiguation of their meaning in evidence), but scanning through and searching for likely keywords didn't yield anything unambiguous.

As mentioned in this post TCGA: Does TCGA cancer studies have mRNA expression data for Control/Normal samples? '01' / '11' in the filenames of samples can be used to differentiate between tumour and adjacent normal tissue data and this looks to still be the case take this random example. However, using this would require a rather inefficient and possibly error prone process of trying to parse the filenames and filter on that.

Does anyone here know if tumour / adjacent normal status of files in represented in the GDC data portal's data model in a way that can be easily filtered?

RNA-Seq DNA methylation TCGA GDC • 1.8k views
ADD COMMENT
0
Entering edit mode
5.6 years ago

On the GDC, if you configure a search (like you have) and then download the manifest, you can programmatically look up the TCGA barcode (and infer tumour - normal status) by following either of these functions:

Then, just include / exclude what you want from the manifest and download the files via the GDC Data Transfer Tool.

You may also navigate this method in order to see numbers of tumour and normal in different TCGA cohorts:

There are various ways of interrogating TCGA data, of course. Also, the data is hosted on various third party websites in varying states / forms of processing.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 1325 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6