Differential expression gene analysis
0
1
Entering edit mode
6.2 years ago
bioinfo456 ▴ 150

I have an RNA Seq data set that is log2(x+1) transformed RSEM normalized count. Can somebody explain how I can obtain raw read counts from this that way I could perform DEG analysis using DeSeq2 R package?

rna-seq R • 3.2k views
ADD COMMENT
0
Entering edit mode

Please see Devon's and Michael's input here: RSEM Downstream Analysis

Also input from Michael and Simon (DESeq2 deveopers) on Bioconductor, here: https://support.bioconductor.org/p/51577/

ADD REPLY
0
Entering edit mode

Thanks for the reply Kevin. Devon suggests Limma or edgeR. DESeq2 developers recommend the option of using rounded estimated gene-level counts from RSEM as input to DESeq2. By rounded, do they mean the closest integer value?

ADD REPLY
0
Entering edit mode

Yes, the general idea that I get from the comments is that, if you really wish to use DESeq2, then you should:

  1. summarise your RSEM estimated counts for transcript isoforms into gene-level counts
  2. Round the gene-level counts to integers, i.e., no decimal places

Obviously the ideal situation is to get the raw counts (or produce them yourself). May I ask on which data you are working? - TCGA?

ADD REPLY
0
Entering edit mode

Yes, TCGA gene expression RNAseq - IlluminaHiSeq data.

Description of the data set is as follows :- The gene expression profile was measured experimentally using the Illumina HiSeq 2000 RNA Sequencing platform by the University of North Carolina TCGA genome characterization center. Level 3 data was downloaded from TCGA data coordination center. This dataset shows the gene-level transcription estimates, as in log2(x+1) transformed RSEM normalized count. Genes are mapped onto the human genome coordinates using UCSC Xena HUGO probeMap.

I don't have the resource to produce raw counts. You reckon i can round off this normalized count and use DeSeq on it? Thanks a ton for your insight.

ADD REPLY
0
Entering edit mode

You could try the recommendations of Michael Love, Simon Anders, and Devon Ryan, as they are experts in this area. From the discussion, it just didn't seem convincing that it is an ideal type of data to use for DESeq2, though.

If it is TCGA data that you want to analyse, then you should be able to get the raw HTSeq counts via the GDC Legacy Archive, but it depends on the cancer of interest. I recently re-analysed all 500+ raw HTSeq count files for endometrial cancer, for example, using DESeq2.

ADD REPLY
1
Entering edit mode

I'm gonna go ahead with Michael Love's recommendation. Thanks a ton, Kevin :).

ADD REPLY

Login before adding your answer.

Traffic: 2123 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6