Question: GDC data portal - Tumour and Adjacent normal
0
gravatar for rjactonspsfcf
5 months ago by
Southampton
rjactonspsfcf80 wrote:

Hi,

I'm trying to use the python API from https://portal.gdc.cancer.gov/ to select samples with DNA methylation data and/or RNA-seq data from tumour and adjacent normal tissue. I can't seem to find a metadata tag that I can use to filter the samples based on the criterion of the data being from tumour or adjacent normal tissue.

The data model does seem to represent quite a lot of things about the samples but I can't find anything about this criterion in the list of available fields, see: https://docs.gdc.cancer.gov/API/Users_Guide/Appendix_A_Available_Fields/ . I may have missed it, (it's a fairly long list some of the items on which are a bit open to interpretation, and there is no longer form disambiguation of their meaning in evidence), but scanning through and searching for likely keywords didn't yield anything unambiguous.

As mentioned in this post TCGA: Does TCGA cancer studies have mRNA expression data for Control/Normal samples? '01' / '11' in the filenames of samples can be used to differentiate between tumour and adjacent normal tissue data and this looks to still be the case take this random example. However, using this would require a rather inefficient and possibly error prone process of trying to parse the filenames and filter on that.

Does anyone here know if tumour / adjacent normal status of files in represented in the GDC data portal's data model in a way that can be easily filtered?

ADD COMMENTlink modified 5 months ago by Kevin Blighe37k • written 5 months ago by rjactonspsfcf80
0
gravatar for Kevin Blighe
5 months ago by
Kevin Blighe37k
Republic of Ireland
Kevin Blighe37k wrote:

On the GDC, if you configure a search (like you have) and then download the manifest, you can programmatically look up the TCGA barcode (and infer tumour - normal status) by following either of these functions:

Then, just include / exclude what you want from the manifest and download the files via the GDC Data Transfer Tool.

You may also navigate this method in order to see numbers of tumour and normal in different TCGA cohorts:

There are various ways of interrogating TCGA data, of course. Also, the data is hosted on various third party websites in varying states / forms of processing.

Kevin

ADD COMMENTlink modified 5 months ago • written 5 months ago by Kevin Blighe37k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 638 users visited in the last hour