same query but different results (TCGAbiolinks vs TCGA Portal)
1
0
Entering edit mode
4.5 years ago
tyasird ▴ 10

Hi all,

I want to analysis Htseq Counts data to find differantial expression genes.

I have installed TCGAbiolinks package and created query but its not equal to tcga data portal.

here my R code and result, I searched for Kidney cancer and Normal Tissue.

As you can see the difference my R code result is only 128 files, but in tcga data portal I see 215 files.

why results are different?

query  <- GDCquery(
project = c("TCGA-KIRC","TCGA-KIRP","TARGET-WT","CPTAC-3","TCGA-KICH"),
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
experimental.strategy = "RNA-Seq",
workflow.type = "HTSeq - Counts",
sample.type = "Solid Tissue Normal",
legacy = FALSE)

here my tcga data portal query

cases.case_id in ["set_id:AW3tFMDMgWoF7ReWISKV"] and cases.samples.sample_type in ["Solid Tissue Normal"] and files.analysis.workflow_type in ["HTSeq - Counts"] and files.data_category in ["Transcriptome Profiling"] and files.data_type in ["Gene Expression Quantification"] and files.experimental_strategy in ["RNA-Seq"]

enter image description here

enter image description here

R RNA-Seq • 1.4k views
ADD COMMENT
2
Entering edit mode
4.5 years ago
dsull ★ 5.8k

I just ran your queries and am getting the same thing. It seems that TCGAbiolinks isn't retrieving the CPTAC-3 data properly. Because none of the rows from the TCGAbiolinks query are from CPTAC-3, and I get an error message "Error in expandBarcodeInfo(barcodes) : object 'ret' not found" when I try to query for CPTAC-3 by itself in TCGAbiolinks.

For now, if you want to use non-TCGA projects, it's best to just go off the GDC portal. TCGAbiolinks doesn't seem to support CPTAC-3 right now.

ADD COMMENT

Login before adding your answer.

Traffic: 2581 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6