Question: TCGAbiolink does not download all the data requested
gravatar for guillepalou4
5 months ago by
guillepalou410 wrote:

Hi everyone,

I am using TCGAbiolinks and trying to download all RNA-seq data from a specific tumor as an example (TCGA-THYM). I am able to make the queries and download the data correctly. However, I wanted to check if I downloaded all the files (just in case). Following this link you can see there are 11 Solid Tissue Normal TCGA-THYM samples. I searched file by file, and all 11 files contain RNA-seq data (FPKM-UQ.txt files). Nevertheless, TCGAbiolinks only downloads 2 files. I tried all the combinations of parameters and even downloading everything but it's always the same case. I am unable to download the 9 left samples. Anyone knows the problem? Here's the simple code I used:

query <- GDCquery(project = "TCGA-THYM", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "HTSeq - FPKM-UQ", sample.type = "Solid Tissue Normal")GDCdownload(query, method = "api", files.per.chunk = 10)

GDCdownload(query, method = "api", files.per.chunk = 10)


ADD COMMENTlink modified 5 months ago • written 5 months ago by guillepalou410

I would encourage you to post an issue on their GitHub repository. The developers do not log in here too often. Keep in mind that FPKM-UQ data is not suitable for many analyses.

ADD REPLYlink modified 5 months ago • written 5 months ago by Kevin Blighe71k

Hi Kevin, thanks for the answer, I finally solved it. It is true that the 11 files contain RNA-seq FPKM-UQ files but although I used a "Solid normal tissue" filter, those files were from primary tumors. Weird. But at least the downloads are correct. About the last comment... Which type of analyses this type of data isn't suitable?

ADD REPLYlink written 5 months ago by guillepalou410
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1566 users visited in the last hour