I am using TCGAbiolinks and trying to download all RNA-seq data from a specific tumor as an example (TCGA-THYM). I am able to make the queries and download the data correctly. However, I wanted to check if I downloaded all the files (just in case). Following this link you can see there are 11 Solid Tissue Normal TCGA-THYM samples. I searched file by file, and all 11 files contain RNA-seq data (FPKM-UQ.txt files). Nevertheless, TCGAbiolinks only downloads 2 files. I tried all the combinations of parameters and even downloading everything but it's always the same case. I am unable to download the 9 left samples. Anyone knows the problem? Here's the simple code I used:
query <- GDCquery(project = "TCGA-THYM", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "HTSeq - FPKM-UQ", sample.type = "Solid Tissue Normal")GDCdownload(query, method = "api", files.per.chunk = 10) GDCdownload(query, method = "api", files.per.chunk = 10)
I would encourage you to post an issue on their GitHub repository. The developers do not log in here too often. Keep in mind that FPKM-UQ data is not suitable for many analyses.
Hi Kevin, thanks for the answer, I finally solved it. It is true that the 11 files contain RNA-seq FPKM-UQ files but although I used a "Solid normal tissue" filter, those files were from primary tumors. Weird. But at least the downloads are correct. About the last comment... Which type of analyses this type of data isn't suitable?