TCGAbiolink does not download all the data requested
Entering edit mode
2.7 years ago
guillepalou4 ▴ 20

Hi everyone,

I am using TCGAbiolinks and trying to download all RNA-seq data from a specific tumor as an example (TCGA-THYM). I am able to make the queries and download the data correctly. However, I wanted to check if I downloaded all the files (just in case). Following this link you can see there are 11 Solid Tissue Normal TCGA-THYM samples. I searched file by file, and all 11 files contain RNA-seq data (FPKM-UQ.txt files). Nevertheless, TCGAbiolinks only downloads 2 files. I tried all the combinations of parameters and even downloading everything but it's always the same case. I am unable to download the 9 left samples. Anyone knows the problem? Here's the simple code I used:

query <- GDCquery(project = "TCGA-THYM", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "HTSeq - FPKM-UQ", sample.type = "Solid Tissue Normal")GDCdownload(query, method = "api", files.per.chunk = 10)
GDCdownload(query, method = "api", files.per.chunk = 10)


tcgabiolinks RNA-Seq tcga • 756 views
Entering edit mode

I would encourage you to post an issue on their GitHub repository. The developers do not log in here too often. Keep in mind that FPKM-UQ data is not suitable for many analyses.

Entering edit mode

Hi Kevin, thanks for the answer, I finally solved it. It is true that the 11 files contain RNA-seq FPKM-UQ files but although I used a "Solid normal tissue" filter, those files were from primary tumors. Weird. But at least the downloads are correct. About the last comment... Which type of analyses this type of data isn't suitable?


Login before adding your answer.

Traffic: 2578 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6