Question: TCGA/ICGC do not match raw count expression data
0
gravatar for zamalloa
3.5 years ago by
zamalloa20
United States
zamalloa20 wrote:

Hi,

I'm trying to obtain raw counts for rnaseq expression data for breast cancer. I've extracted the data from the TCGA portal for RNAseq V1 for breast cancer instead of V2 because the latter does not posses "true" raw counts as pointed out elsewhere (non-integers) :

http://seqanswers.com/forums/showthread.php?t=42911

I was also guided to the ICGC data portal with the hopes of obtaining an already parsed table, which I downloaded for rnaseq raw counts as well (exp_seq.BRCA-US.tsv). However, when I tried to double check if both sites (TCGA/IGCG) were in agreement in term of raw counts data for the same individual, I found out that this was not the case. For example in TCGA I find that:

Gene TCGA-AN-A0FL-01A TCGA-AN-A0FT-01A
ACAP3 4832 2580
ACAT1 8202 1916

while for ICGC, the same samples raw count values were:

Gene TCGA-AN-A0FL-01A TCGA-AN-A0FT-01A
ACAP3 0 1148
ACAT1 0 896
 

Both sites (TCGA https://wiki.nci.nih.gov/display/TCGA/RNASeq / ICGC https://docs.icgc.org/sequencing-based-gene-expression-expseq-primary-file-p) state that they are representing raw counts for RNAseq expression data. Am I misinterpreting something here, is there an extra-normalization step not shown? 

I would appreciate any help, thanks!

rna-seq gene expression icgc tcga • 1.7k views
ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by zamalloa20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 842 users visited in the last hour