Hi,
I'm trying to obtain raw counts for rnaseq expression data for breast cancer. I've extracted the data from the TCGA portal for RNAseq V1 for breast cancer instead of V2 because the latter does not posses "true" raw counts as pointed out elsewhere (non-integers) :
http://seqanswers.com/forums/showthread.php?t=42911
I was also guided to the ICGC data portal with the hopes of obtaining an already parsed table, which I downloaded for rnaseq raw counts as well (exp_seq.BRCA-US.tsv). However, when I tried to double check if both sites (TCGA/IGCG) were in agreement in term of raw counts data for the same individual, I found out that this was not the case. For example in TCGA I find that:
Gene | TCGA-AN-A0FL-01A | TCGA-AN-A0FT-01A |
ACAP3 | 4832 | 2580 |
ACAT1 | 8202 | 1916 |
while for ICGC, the same samples raw count values were:
Gene | TCGA-AN-A0FL-01A | TCGA-AN-A0FT-01A |
ACAP3 | 0 | 1148 |
ACAT1 | 0 | 896 |
Both sites (TCGA https://wiki.nci.nih.gov/display/TCGA/RNASeq / ICGC https://docs.icgc.org/sequencing-based-gene-expression-expseq-primary-file-p) state that they are representing raw counts for RNAseq expression data. Am I misinterpreting something here, is there an extra-normalization step not shown?
I would appreciate any help, thanks!