Question: GDC WES data ID
1
gravatar for chipseq33
3.9 years ago by
chipseq3310
chipseq3310 wrote:

Hello everyone,

I am trying to download the TCGA WES bam files through GDC portal. Now with this new IDs "UUID" I can not figure out which bam file is from Tumor and which is from normal samples.

Could you please help me with this?

Thanks

gdc • 1.2k views
ADD COMMENTlink modified 3.9 years ago by sergeym10 • written 3.9 years ago by chipseq3310
1
gravatar for sergeym
3.9 years ago by
sergeym10
sergeym10 wrote:

You can match files to sample type using the JSON metadata file downloadable from the Cart page. The "sample_type" field carries the information you are looking for. If you would rather decode the TCGA barcode, that is stored in the submitter_id fields of the various case/biospecimen entires (case / sample / aliquot etc)

If you prefer a TSV, that can be generated by the GDC API. Here's the API documentation ( https://gdc-docs.nci.nih.gov/API/Users_Guide/Search_and_Retrieval/ ). If you want a shortcut, here is one I like to use:

1) Do a search on the GDC Data Portal that creates a list of search results that encompasses files you are interested in. This is not the cart page, that would not work. Copy the URL of that page. It looks something like this:

https://gdc-portal.nci.nih.gov/search/f?filters=%7B%22op%22:%22and%22,%22content%22:%5B%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_format%22,%22value%22:%5B%22BAM%22%5D%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.primary_site%22,%22value%22:%5B%22Colorectal%22%5D%7D%7D%5D%7D

2) Copy the filters parameter from the URL into your clipboard. You get something like the following ( here I'm using the filters parameter from the URL above ):

filters=%7B%22op%22:%22and%22,%22content%22:%5B%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_format%22,%22value%22:%5B%22BAM%22%5D%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.primary_site%22,%22value%22:%5B%22Colorectal%22%5D%7D%7D%5D%7D

3) Paste it into the following API call at the end in place of "PUT_FILTER_PARAMETER_HERE":

https://gdc-api.nci.nih.gov/files?format=TSV&fields=file_id,file_name,cases.submitter_id,cases.case_id,data_category,data_type,cases.samples.tumor_descriptor,cases.samples.tissue_type,cases.samples.sample_type,cases.samples.submitter_id,cases.samples.sample_id,analysis.workflow_type,cases.project.project_id,cases.samples.portions.analytes.aliquots.aliquot_id,cases.samples.portions.analytes.aliquots.submitter_id&size=200000&PUT_FILTER_PARAMETER_HERE

you get an API call that looks like this:

https://gdc-api.nci.nih.gov/files?format=TSV&fields=file_id,file_name,cases.submitter_id,cases.case_id,data_category,data_type,cases.samples.tumor_descriptor,cases.samples.tissue_type,cases.samples.sample_type,cases.samples.submitter_id,cases.samples.sample_id,analysis.workflow_type,cases.project.project_id,cases.samples.portions.analytes.aliquots.aliquot_id,cases.samples.portions.analytes.aliquots.submitter_id&size=200000&filters=%7B%22op%22:%22and%22,%22content%22:%5B%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_format%22,%22value%22:%5B%22BAM%22%5D%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.primary_site%22,%22value%22:%5B%22Colorectal%22%5D%7D%7D%5D%7D

4) When you click the link above you will get the TSV file with the metadata you're looking for.

Note: If you are looking for files in the GDC Legacy Archive, in #3 you have to add legacy to the URL: https://gdc-api.nci.nih.gov/legacy/files?....

ADD COMMENTlink modified 3.8 years ago • written 3.9 years ago by sergeym10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 683 users visited in the last hour