Differential expression analysis for log2(x+1) transformed RSEM normalized count (TCGA)
1
1
Entering edit mode
4.5 years ago
Vasu ▴ 640

Hello everyone,

I have retrieved the rnaseq data from TCGA. The data is Level_3 Data (file names: *.rsem.genes.normalized_results) downloaded from TCGA, shows the gene-level transcription estimates, as in log2(x+1) transformed RSEM normalized count.

Data looks like this:

sample  TCGA-FV-A495-01 TCGA-G3-A3CH-11 TCGA-CC-A3MB-01 TCGA-BC-A3KF-01 TCGA-DD-A4NV-01
ARHGEF10L   11.1818       11.0186          11.243         11.1612          12.0167
HIF3A       5.2482        5.3847            4.0013         2.9374           4.7857
RNF17       4.1956          0                 0              0                0
RNF10       11.5047       11.669           12.0791        12.5931           11.4616
RNF11       9.5995         11.398           9.8248         9.9459           10.8368
RNF13       9.6257        10.8249           10.5608        10.5179          10.1428
GTF2IP1     11.8053       11.5487           12.1228        12.5044           12.947
REM1        5.6835         3.5408           3.5582          1.7444           3.8613
MTVR2         0            1.4714              0              0                0


My question is which package should be used to do differential expression analysis for this type of RSEM data? Do I need to transform the data for differential analysis? I don't have any idea about these RSEM counts. Can anyone help me in this?

r RNA-Seq differential analysis tcga • 7.1k views
2
Entering edit mode

If not I would advice you to read this post: https://support.bioconductor.org/p/91054/ from the guys that created DESeq2/edgeR/limma. Quite informative on the data you have and what you can do.

0
Entering edit mode

Thank you. I see that they are using library(curatedCRCData) to get the data. CRC is colorectal. I want the data for liver. I tried giving curatedliverData it says not available.

0
Entering edit mode

As mentioned in the post, it seems you can run limma/voom or the other softwares on your data because the "normalization" used on TCGA data should not interfere too much with the analysis but it is not the best since the pipelines were not designed for data other than raw counts. They are several other threads about that on bioconductor support so if I was at your place I would first read them. But try to find raw counts if you can.

0
Entering edit mode

This is where I got the data (https://xenabrowser.net/datapages/?dataset=TCGA.LIHC.sampleMap/HiSeqV2&host=https://tcga.xenahubs.net) When I click on the rawdata for download it gives page isn't working.

0
Entering edit mode

0
Entering edit mode
4.5 years ago

Hello. You can use limma or edgeR as I know.

Can you explain what your data represent?