Tutorial: TCGA UUIDS to TCGA barcode (SampleID) in R
10
gravatar for martinguerrerog89
4 months ago by
martinguerrerog89100 wrote:

For those not familiar with the command line and with the JSON query language, here is a fairly simple way to map UUIDS to TCGA barcode ID using R and a canned command in the terminal

The first part is in R

1) Extract the files ID from your manifest file (the one you get from the GDC after you downloaded your data)

setwd("C:/Here/your/manifest/directory")

manifest= "gdc_manifest_20160921_171519.txt" #Manifest name 
x=read.table(manifest,header = T)

id= toString(sprintf('"%s"', x$id))

2) Create Payload.txt with the commands needed

This commands are extracted from the GDC website https://gdc-docs.nci.nih.gov/API/Users_Guide/Search_and_Retrieval/

Part1= '{"filters":{"op":"in","content":{"field":"files.file_id","value":[ '


Part2= '] }},"format":"TSV","fields":"file_id,file_name,cases.submitter_id,cases.case_id,data_category,data_type,cases.samples.tumor_descriptor,cases.samples.tissue_type,cases.samples.sample_type,cases.samples.submitter_id,cases.samples.sample_id,cases.samples.portions.analytes.aliquots.aliquot_id,cases.samples.portions.analytes.aliquots.submitter_id",
    "size":"500"} '

Sentence= paste(Part1,id,Part2, collapse=" ") #This creates the search sentence for the command line



write.table(Sentence,"Payload.txt",quote=F,col.names=F,row.names=F)

The second part is in the command line (CMD or terminal)

cd C:/Here/your/manifest/directory

curl --request POST --header "Content-Type: application/json" --data @Payload.txt "https://gdc-api.nci.nih.gov/files" > File_metadata.txt

Now you should have a file called File_metadata.txt in your working folder with all the data you need

If you get a message:

"'curl' is not recognized as an operable program or batch file."

you should install the cURL library in your computer (if you don't know how to do it, follow this link)

tcga gdc next-gen tutorial R • 1.1k views
ADD COMMENTlink modified 10 weeks ago by Chunjie Liu170 • written 4 months ago by martinguerrerog89100
0
gravatar for Chunjie Liu
10 weeks ago by
Chunjie Liu170
US, Houston
Chunjie Liu170 wrote:

GDC provides API for Curl and HTTPie for command retrieving info through the UUID.

I wrote a simple python script for mapping UUID to TCGA barcode (submitterID). Just input the manifest file downloaded from GDC Data-Portal. Defaul is latest version, you can't use legacy archive UUID to convert through latest version. You may change the endpoint to your version by yourself.

files_endpt = "https://gdc-api.nci.nih.gov/<version>/legacy/<endpoint>"

ADD COMMENTlink written 10 weeks ago by Chunjie Liu170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 530 users visited in the last hour