Question on GDC Legacy RNAseq data
0
0
Entering edit mode
4.4 years ago
kvougas • 0

Hello everyone,

I have the following issue.

I downloaded TCGA RNAseq Legacy data (level 3 - I think this means rormalized) using the TCGAbiolinks Bioconductor pakage and within different files of the same project I find discrepancies and I would like someone to explain.

Specifically

The TCGAbiolinks query was: query <- GDCquery(project = project, data.category = "Gene expression", data.type = "Gene expression quantification", platform = "Illumina HiSeq", legacy = T)

Project: TCGA-BRCA

  1. FIle/case: unc.edu.e6dbaf07-3551-4c73-a2f2-f1bea4fa8e72.1989506.rsem.genes.normalized_results Sample: gene_id normalized_count ?|100130426 0 ?|100133144 13.6068 ?|100134869 12.0568

  2. FIle/case:UNCID_421458.TCGA-BH-A0BW-01A-11R-A115-07.110527_UNC10-SN254_0224_AD0CPKABXX.2.trimmed.annotated.gene.quantification.txt Sample: gene raw_counts median_length_normalized RPKM ?|100130426 0 0 0 ?|100133144 189 7.7218543046 1.1683197549 ?|100134869 139 4.3601003764 0.6511684249

In the first case I get normalized counts only while in the second case I get raw counts, median_length_normalized & RPKM. Say that i want to compare gene expression between 1 & 2. What do I do since I think it wouldn't be wise to compare normalized counts vs raw counts

Sorry if this question is really basic but I am just starting to find my way around...

Thanks in advance

RNA-Seq TCGA GDC Normalization TCGAbiolincs • 1.5k views
ADD COMMENT

Login before adding your answer.

Traffic: 1257 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6