Question: Converting UUID downloaded from GDC to TCGA names
0
gravatar for emiliano.traini
2.1 years ago by
emiliano.traini0 wrote:

hi! I have dwnloaaded files regarding breast cancer genome transcription with the following python code:

import requests
import json
import re
url = "https://api.gdc.cancer.gov/files"
filters = {
    "op": "and",
    "content":[
        {
        "op": "in",
        "content":{
            "field": "cases.primary_site",
            "value": ["Breast"]
            }
        },
        {
        "op": "in",
        "content":{
            "field": "files.analysis.workflow_type",
            "value": ["HTSeq - FPKM-UQ"]
            }
        },
        {
        "op": "in",
        "content":{
            "field": "files.data_category",
            "value": ["Transcriptome Profiling"]
            }
        }
    ]
}

params = {
    "filters" : json.dumps(filters), # prende un oggetto (filters) e return stringa
    "fields" : "file_id",
    "format" : "JSON",
    "size" : "2000"
    }

r = requests.get(url, params = params)
file_uuid_list = []
for file_entry in json.loads(r.content.decode("utf-8"))["data"]["hits"]:
    file_uuid_list.append(file_entry["file_id"])

url_data = "https://api.gdc.cancer.gov/data"

params = {"ids": file_uuid_list}

response = requests.post(url_data, data = json.dumps(params), headers = {"Content-Type": "application/json"})

response_head_cd = response.headers["Content-Disposition"]

file_name = re.findall("filename=(.+)", response_head_cd)[0]

with open(file_name, "wb") as output_file:
    output_file.write(response.content)

i can't manage to find the TCGA names of the downloaded files. i have tried to modify the code with the following one:

params = {
    "filters" : json.dumps(filters), # prende un oggetto (filters) e return stringa
    "fields" : "file_id, cases.submitter_id, cases.case_id",
    "format" : "JSON",
    "size" : "2000"
    }

but it doesn't work because maybe the URL is wrong (I have connection error in request.get())

gdc phyton • 1.0k views
ADD COMMENTlink written 2.1 years ago by emiliano.traini0

Try this: C: Sample names for TCGA data from GDC-legacy archive

ADD REPLYlink written 2.1 years ago by Kevin Blighe61k

thanks. but i would like an help in python language. the URL that you gived me is for R.

ADD REPLYlink written 2.1 years ago by emiliano.traini0

I guess you can either do the R->py algorithm conversion yourself, or branch out to R as an intermediate step. I don't think it's fair to expect a solution in your language of choice without good reason why an existing solution is not usable.

ADD REPLYlink written 2.1 years ago by RamRS27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1807 users visited in the last hour