Question: GDC data portal - Tumour and Adjacent normal
gravatar for rjactonspsfcf
2.5 years ago by
rjactonspsfcf130 wrote:


I'm trying to use the python API from to select samples with DNA methylation data and/or RNA-seq data from tumour and adjacent normal tissue. I can't seem to find a metadata tag that I can use to filter the samples based on the criterion of the data being from tumour or adjacent normal tissue.

The data model does seem to represent quite a lot of things about the samples but I can't find anything about this criterion in the list of available fields, see: . I may have missed it, (it's a fairly long list some of the items on which are a bit open to interpretation, and there is no longer form disambiguation of their meaning in evidence), but scanning through and searching for likely keywords didn't yield anything unambiguous.

As mentioned in this post TCGA: Does TCGA cancer studies have mRNA expression data for Control/Normal samples? '01' / '11' in the filenames of samples can be used to differentiate between tumour and adjacent normal tissue data and this looks to still be the case take this random example. However, using this would require a rather inefficient and possibly error prone process of trying to parse the filenames and filter on that.

Does anyone here know if tumour / adjacent normal status of files in represented in the GDC data portal's data model in a way that can be easily filtered?

ADD COMMENTlink modified 2.5 years ago by Kevin Blighe70k • written 2.5 years ago by rjactonspsfcf130
gravatar for Kevin Blighe
2.5 years ago by
Kevin Blighe70k
Republic of Ireland
Kevin Blighe70k wrote:

On the GDC, if you configure a search (like you have) and then download the manifest, you can programmatically look up the TCGA barcode (and infer tumour - normal status) by following either of these functions:

Then, just include / exclude what you want from the manifest and download the files via the GDC Data Transfer Tool.

You may also navigate this method in order to see numbers of tumour and normal in different TCGA cohorts:

There are various ways of interrogating TCGA data, of course. Also, the data is hosted on various third party websites in varying states / forms of processing.


ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Kevin Blighe70k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1490 users visited in the last hour