Question: Differential expression gene analysis
1
gravatar for Uday Rangaswamy
17 months ago by
Indian Institute of Technology, Madras, India
Uday Rangaswamy120 wrote:

I have an RNA Seq data set that is log2(x+1) transformed RSEM normalized count. Can somebody explain how I can obtain raw read counts from this that way I could perform DEG analysis using DeSeq2 R package?

rna-seq R • 1.1k views
ADD COMMENTlink modified 17 months ago • written 17 months ago by Uday Rangaswamy120

Please see Devon's and Michael's input here: RSEM Downstream Analysis

Also input from Michael and Simon (DESeq2 deveopers) on Bioconductor, here: https://support.bioconductor.org/p/51577/

ADD REPLYlink modified 17 months ago • written 17 months ago by Kevin Blighe46k

Thanks for the reply Kevin. Devon suggests Limma or edgeR. DESeq2 developers recommend the option of using rounded estimated gene-level counts from RSEM as input to DESeq2. By rounded, do they mean the closest integer value?

ADD REPLYlink written 17 months ago by Uday Rangaswamy120

Yes, the general idea that I get from the comments is that, if you really wish to use DESeq2, then you should:

  1. summarise your RSEM estimated counts for transcript isoforms into gene-level counts
  2. Round the gene-level counts to integers, i.e., no decimal places

Obviously the ideal situation is to get the raw counts (or produce them yourself). May I ask on which data you are working? - TCGA?

ADD REPLYlink written 17 months ago by Kevin Blighe46k

Yes, TCGA gene expression RNAseq - IlluminaHiSeq data.

Description of the data set is as follows :- The gene expression profile was measured experimentally using the Illumina HiSeq 2000 RNA Sequencing platform by the University of North Carolina TCGA genome characterization center. Level 3 data was downloaded from TCGA data coordination center. This dataset shows the gene-level transcription estimates, as in log2(x+1) transformed RSEM normalized count. Genes are mapped onto the human genome coordinates using UCSC Xena HUGO probeMap.

I don't have the resource to produce raw counts. You reckon i can round off this normalized count and use DeSeq on it? Thanks a ton for your insight.

ADD REPLYlink modified 17 months ago • written 17 months ago by Uday Rangaswamy120

You could try the recommendations of Michael Love, Simon Anders, and Devon Ryan, as they are experts in this area. From the discussion, it just didn't seem convincing that it is an ideal type of data to use for DESeq2, though.

If it is TCGA data that you want to analyse, then you should be able to get the raw HTSeq counts via the GDC Legacy Archive, but it depends on the cancer of interest. I recently re-analysed all 500+ raw HTSeq count files for endometrial cancer, for example, using DESeq2.

ADD REPLYlink written 17 months ago by Kevin Blighe46k
1

I'm gonna go ahead with Michael Love's recommendation. Thanks a ton, Kevin :).

ADD REPLYlink written 17 months ago by Uday Rangaswamy120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 565 users visited in the last hour