GDC WES data ID
1
1
Entering edit mode
7.6 years ago
chipseq33 ▴ 10

Hello everyone,

I am trying to download the TCGA WES bam files through GDC portal. Now with this new IDs "UUID" I can not figure out which bam file is from Tumor and which is from normal samples.

Could you please help me with this?

Thanks

GDC • 1.9k views
ADD COMMENT
1
Entering edit mode
7.6 years ago
sergeym ▴ 10

You can match files to sample type using the JSON metadata file downloadable from the Cart page. The "sample_type" field carries the information you are looking for. If you would rather decode the TCGA barcode, that is stored in the submitter_id fields of the various case/biospecimen entires (case / sample / aliquot etc)

If you prefer a TSV, that can be generated by the GDC API. Here's the API documentation ( https://gdc-docs.nci.nih.gov/API/Users_Guide/Search_and_Retrieval/ ). If you want a shortcut, here is one I like to use:

1) Do a search on the GDC Data Portal that creates a list of search results that encompasses files you are interested in. This is not the cart page, that would not work. Copy the URL of that page. It looks something like this:

https://gdc-portal.nci.nih.gov/search/f?filters=%7B%22op%22:%22and%22,%22content%22:%5B%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_format%22,%22value%22:%5B%22BAM%22%5D%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.primary_site%22,%22value%22:%5B%22Colorectal%22%5D%7D%7D%5D%7D

2) Copy the filters parameter from the URL into your clipboard. You get something like the following ( here I'm using the filters parameter from the URL above ):

filters=%7B%22op%22:%22and%22,%22content%22:%5B%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_format%22,%22value%22:%5B%22BAM%22%5D%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.primary_site%22,%22value%22:%5B%22Colorectal%22%5D%7D%7D%5D%7D

3) Paste it into the following API call at the end in place of "PUT_FILTER_PARAMETER_HERE":

https://gdc-api.nci.nih.gov/files?format=TSV&fields=file_id,file_name,cases.submitter_id,cases.case_id,data_category,data_type,cases.samples.tumor_descriptor,cases.samples.tissue_type,cases.samples.sample_type,cases.samples.submitter_id,cases.samples.sample_id,analysis.workflow_type,cases.project.project_id,cases.samples.portions.analytes.aliquots.aliquot_id,cases.samples.portions.analytes.aliquots.submitter_id&size=200000&PUT_FILTER_PARAMETER_HERE

you get an API call that looks like this:

https://gdc-api.nci.nih.gov/files?format=TSV&fields=file_id,file_name,cases.submitter_id,cases.case_id,data_category,data_type,cases.samples.tumor_descriptor,cases.samples.tissue_type,cases.samples.sample_type,cases.samples.submitter_id,cases.samples.sample_id,analysis.workflow_type,cases.project.project_id,cases.samples.portions.analytes.aliquots.aliquot_id,cases.samples.portions.analytes.aliquots.submitter_id&size=200000&filters=%7B%22op%22:%22and%22,%22content%22:%5B%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_format%22,%22value%22:%5B%22BAM%22%5D%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.primary_site%22,%22value%22:%5B%22Colorectal%22%5D%7D%7D%5D%7D

4) When you click the link above you will get the TSV file with the metadata you're looking for.

Note: If you are looking for files in the GDC Legacy Archive, in #3 you have to add legacy to the URL: https://gdc-api.nci.nih.gov/legacy/files?....

ADD COMMENT

Login before adding your answer.

Traffic: 2807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6