same query but different results (TCGAbiolinks vs TCGA Portal)
18 months ago
tyasird ▴ 10

Hi all,

I'm new in this discipline, I want to analysis Htseq Counts data for to find differantial expression genes.

I have installed TCGAbiolinks package and created query but its not equal to tcga data portal.

here my R code and result, I searched for Kidney cancer and Normal Tissue.

As you can see the difference my R code result is only 128 files, but in tcga data portal I see 215 files.

why results are different?

query  <- GDCquery(
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
experimental.strategy = "RNA-Seq",
workflow.type = "HTSeq - Counts",
sample.type = "Solid Tissue Normal",
legacy = FALSE)

here my tcga data portal query

cases.case_id in ["set_id:AW3tFMDMgWoF7ReWISKV"] and cases.samples.sample_type in ["Solid Tissue Normal"] and files.analysis.workflow_type in ["HTSeq - Counts"] and files.data_category in ["Transcriptome Profiling"] and files.data_type in ["Gene Expression Quantification"] and files.experimental_strategy in ["RNA-Seq"]

18 months ago
dsull ★ 1.8k

I just ran your queries and am getting the same thing. It seems that TCGAbiolinks isn't retrieving the CPTAC-3 data properly. Because none of the rows from the TCGAbiolinks query are from CPTAC-3, and I get an error message "Error in expandBarcodeInfo(barcodes) : object 'ret' not found" when I try to query for CPTAC-3 by itself in TCGAbiolinks.

For now, if you want to use non-TCGA projects, it's best to just go off the GDC portal. TCGAbiolinks doesn't seem to support CPTAC-3 right now.


