Entering edit mode
5.8 years ago
emiliano.traini
▴
20
Hi!
I have downloaded files regarding breast cancer genome transcription with the following python code:
import requests
import json
import re
url = "https://api.gdc.cancer.gov/files"
filters = {
"op": "and",
"content":[
{
"op": "in",
"content":{
"field": "cases.primary_site",
"value": ["Breast"]
}
},
{
"op": "in",
"content":{
"field": "files.analysis.workflow_type",
"value": ["HTSeq - FPKM-UQ"]
}
},
{
"op": "in",
"content":{
"field": "files.data_category",
"value": ["Transcriptome Profiling"]
}
}
]
}
params = {
"filters" : json.dumps(filters), # prende un oggetto (filters) e return stringa
"fields" : "file_id",
"format" : "JSON",
"size" : "2000"
}
r = requests.get(url, params = params)
file_uuid_list = []
for file_entry in json.loads(r.content.decode("utf-8"))["data"]["hits"]:
file_uuid_list.append(file_entry["file_id"])
url_data = "https://api.gdc.cancer.gov/data"
params = {"ids": file_uuid_list}
response = requests.post(url_data, data = json.dumps(params), headers = {"Content-Type": "application/json"})
response_head_cd = response.headers["Content-Disposition"]
file_name = re.findall("filename=(.+)", response_head_cd)[0]
with open(file_name, "wb") as output_file:
output_file.write(response.content)
I can't manage to find the TCGA names of the downloaded files. I have tried to modify the code with the following one:
params = {
"filters" : json.dumps(filters), # prende un oggetto (filters) e return stringa
"fields" : "file_id, cases.submitter_id, cases.case_id",
"format" : "JSON",
"size" : "2000"
}
but it doesn't work because maybe the URL is wrong (I have connection error in request.get()
)
Try this: C: Sample names for TCGA data from GDC-legacy archive
Thanks. but I would like an help in python language. The URL that you gave me is for R.
I guess you can either do the R->py algorithm conversion yourself, or branch out to R as an intermediate step. I don't think it's fair to expect a solution in your language of choice without good reason why an existing solution is not usable.