Question: Differential gene expression, starting from RSEM expected count values.
gravatar for english.server
7 months ago by
english.server210 wrote:


I could/should have asked this question at bioC forum, but the answers there are usually (just) over my head.

I wondered how to use a table of "Expected counts" from rsem to obtain DEGs using DeSeq2/EdgeR? The expected counts are from UCSC Xena- processed GTEX data. There seems to be more than one (proposed) workflows present for this kind of study. Before delving deeper into calculations, I wanted to know which approach is more often used:

  1. Rounding the expected count: Question about how to transform RSEM expected_count of TCGA TARGET GTEX to integers?
  2. As the tximport vignette at bioC explains . However, in the vignette, the starting txi object is built differently from my data (I still have not checked the structure of the resulting R object, so that I might create it manually; waiting for the approval of this method over the simpler method 1 above ) .

    txi <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE)
    dds <- DESeqDataSetFromTximport(txi, sampleTable, ~condition)
rsem deseq2 expected-count • 497 views
ADD COMMENTlink modified 7 months ago by i.sudbery9.1k • written 7 months ago by english.server210

tximport was developed by the same guys that developed edgeR. The concern that lead to the creation of the package as I understand it is that people make omission of the assumptions these packages work on (in this case, that counts are raw counts and not extimates), so they decided to create a package to couple in the pipeline. As for what exactly it does for DeSeq2, you should look at the code for the function

ADD REPLYlink written 7 months ago by biofalconch470
gravatar for i.sudbery
7 months ago by
Sheffield, UK
i.sudbery9.1k wrote:

Definately use tximport if you can.

If I remember correctly, the txi object is a list with three slots: $abundance contains a matrix of TPMs, with one row per transcript and one column per sample, $count contains the same matrix for estimated counts, $length contains a vector with the length of each transcript. This shouldn't be too hard to create.

You can then use the tximport functions for collapsing transcript counts to gene counts, and creating a DESeq dataset object or edgeR object from that collapsed txi. The reason to go from transcript counts rather than gene counts, is that tximport uses the transcript counts to create a weighted effective gene length for use as an offset in the DE model. This protects against splicing changes making the counts from the same gene incomparable between samples, because the effective gene length is different.

I seem to remember that RSEM is able to do something similar, but I can't quite be sure.

ADD COMMENTlink written 7 months ago by i.sudbery9.1k

Thank you. especially useful answer as the other post/point of view I referred to is answered by you.


ADD REPLYlink written 6 months ago by english.server210
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 735 users visited in the last hour