Question: How to make differential expression analysis with normalized data?
gravatar for LuisNagano
11 months ago by
University of Campinas
LuisNagano40 wrote:

Hello, could anyone help me out? Is there any R package that runs differential expression analysis or statistical test like generating log2FC and adj-p values from normalized RNA-seq and Array expression values? The available data that I need to analyze is in FPKM, a table with ~50000 genes, I don't have access to raw data.

Thank you very much!

rna-seq deg fpkm • 462 views
ADD COMMENTlink modified 11 months ago by ATpoint36k • written 11 months ago by LuisNagano40

which tool is used for quantification? RSEM, stringtie, cufflink? just try tximport package from deseq2 team

ADD REPLYlink written 11 months ago by boaty110

The authors don't cite the tool used for normalization. DEseq2 only works with genes raw counts, doesn't it? I have normalized data in FPKM. I want a package for analyse any normalized expression data, like MAS5, RSEM, FPKM, TPM...

ADD REPLYlink written 11 months ago by LuisNagano40

This has been discussed before extensively and repetitively, please use the search function. Start from this one: and from there please google around. You'll find pretty much the same answer that the limma-based strategy suggested there is probably the best possible but still bad solution to what you aim to do, as FPKM is not suited for differential analysis. Further details on why that is can be found in numerous threads here, on BioC and the web.

ADD REPLYlink written 11 months ago by ATpoint36k

look at this first tximport tximport will take normalised count and length information to recompute raw count...... you can use tximport to get pseudo raw count then use deseq2 for GDE It works for almost all the modern count quantification tools like kallisto, stringtie. But you need to know which tool is used for gene quantification

ADD REPLYlink modified 11 months ago • written 11 months ago by boaty110

No, that is not true and not recommended. tximport aggregates transcript abundance estimates to the gene level and corrects for average transcript length, it does not do any magic to save you from inferior normalization techniques like FPKM. The transcript information is already lost in FPKM as in most cases this is already the gene level count, therefore tximport would be meaningless. If possible, download the raw data from NCBI or ENA and obtain raw counts. Everything else is inferior. Relying on prenormalized counts where (as OP states) the method section lacks details about the pipeline is not reproducible and therefore IMHO not recommended, beyond the issue that FPKM is a poor choice for normalization.

ADD REPLYlink modified 11 months ago • written 11 months ago by ATpoint36k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1290 users visited in the last hour