Question: Question on GDC Legacy RNAseq data
gravatar for kvougas
3.3 years ago by
kvougas0 wrote:

Hello everyone,

I have the following issue.

I downloaded TCGA RNAseq Legacy data (level 3 - I think this means rormalized) using the TCGAbiolinks Bioconductor pakage and within different files of the same project I find discrepancies and I would like someone to explain.


The TCGAbiolinks query was: query <- GDCquery(project = project, data.category = "Gene expression", data.type = "Gene expression quantification", platform = "Illumina HiSeq", legacy = T)

Project: TCGA-BRCA

  1. FIle/case: Sample: gene_id normalized_count ?|100130426 0 ?|100133144 13.6068 ?|100134869 12.0568

  2. FIle/case:UNCID_421458.TCGA-BH-A0BW-01A-11R-A115-07.110527_UNC10-SN254_0224_AD0CPKABXX.2.trimmed.annotated.gene.quantification.txt Sample: gene raw_counts median_length_normalized RPKM ?|100130426 0 0 0 ?|100133144 189 7.7218543046 1.1683197549 ?|100134869 139 4.3601003764 0.6511684249

In the first case I get normalized counts only while in the second case I get raw counts, median_length_normalized & RPKM. Say that i want to compare gene expression between 1 & 2. What do I do since I think it wouldn't be wise to compare normalized counts vs raw counts

Sorry if this question is really basic but I am just starting to find my way around...

Thanks in advance

ADD COMMENTlink written 3.3 years ago by kvougas0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1317 users visited in the last hour