Tutorial: TCGA UUIDS to TCGA barcode (SampleID) in R
For those not familiar with the command line and with the JSON query language, here is a fairly simple way to map UUIDS to TCGA barcode ID using R and a canned command in the terminal

The first part is in R

1) Extract the files ID from your manifest file (the one you get from the GDC after you downloaded your data)


manifest= "gdc_manifest_20160921_171519.txt" #Manifest name 
x=read.table(manifest,header = T)
manifest_length= nrow(x)
id= toString(sprintf('"%s"', x$id))

2) Create Payload.txt with the commands needed

This commands are extracted from the GDC website https://gdc-docs.nci.nih.gov/API/Users_Guide/Search_and_Retrieval/

Part1= '{"filters":{"op":"in","content":{"field":"files.file_id","value":[ '

Part2= '] }},"format":"TSV","fields":"file_id,file_name,cases.submitter_id,cases.case_id,data_category,data_type,cases.samples.tumor_descriptor,cases.samples.tissue_type,cases.samples.sample_type,cases.samples.submitter_id,cases.samples.sample_id,cases.samples.portions.analytes.aliquots.aliquot_id,cases.samples.portions.analytes.aliquots.submitter_id","size":'

Part3= paste(shQuote(manifest_length),"}",sep="")

Sentence= paste(Part1,id,Part2,Part3, collapse=" ")


The second part is in the command line (CMD or terminal)

cd C:/Here/your/manifest/directory

curl --request POST --header "Content-Type: application/json" --data @Payload.txt "https://gdc-api.nci.nih.gov/files" > File_metadata.txt

Now you should have a file called File_metadata.txt in your working folder with all the data you need

If you get a message:

"'curl' is not recognized as an operable program or batch file."

you should install the cURL library in your computer (if you don't know how to do it, follow this link)

tcga gdc next-gen tutorial R • 2.8k views
10 months ago by martinguerrerog89170

Thank you. It worked for my prostate cancer RNA-seq data.

4 months ago by morovatunc360

thanks for the post, it was very useful ..

9 weeks ago by juanmafernandezm860

Hi, I try use this method for retrieving the sample ID, but it failed, the error in the File_metadata.txt is: { "message": "400 Bad Request: The browser (or proxy) sent a request that this server could not understand." }

how to fix it? Thanks.

5 weeks ago by lin.wang20
Chunjie Liu230
US, Houston
GDC provides API for Curl and HTTPie for command retrieving info through the UUID.

I wrote a simple python script for mapping UUID to TCGA barcode (submitterID). Just input the manifest file downloaded from GDC Data-Portal. Defaul is latest version, you can't use legacy archive UUID to convert through latest version. You may change the endpoint to your version by yourself.

files_endpt = "https://gdc-api.nci.nih.gov/<version>/legacy/<endpoint>"

8 months ago by Chunjie Liu230

Hi, there is some errors when I used it,

Traceback (most recent call last):
  File "m2s.py", line 76, in <module>
  File "m2s.py", line 73, in main
  File "m2s.py", line 69, in run
    gdcAPI(file_ids, manifest)
  File "m2s.py", line 62, in gdcAPI
    response = requests.post(files_endpt, json = params)
  File "//anaconda/lib/python2.7/site-packages/requests/api.py", line 88, in post
    return request('post', url, data=data, **kwargs)
  File "//anaconda/lib/python2.7/site-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
TypeError: request() got an unexpected keyword argument 'json'


5 weeks ago by lin.wang20

Please try python3. And add requests module.

Or you can use R version

5 weeks ago by Chunjie Liu230

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

5 weeks ago by WouterDeCoster20k
