Question

How to make differential expression analysis with normalized data?

0

Entering edit mode

4.6 years ago

LuisNagano ▴ 90

Hello, could anyone help me out? Is there any R package that runs differential expression analysis or statistical test like generating log2FC and adj-p values from normalized RNA-seq and Array expression values? The available data that I need to analyze is in FPKM, a table with ~50000 genes, I don't have access to raw data.

Thank you very much!

DEG FPKM RNA-seq • 3.8k views

ADD COMMENT • link updated 4.6 years ago by ATpoint 82k • written 4.6 years ago by LuisNagano ▴ 90

0

Entering edit mode

which tool is used for quantification? RSEM, stringtie, cufflink? just try tximport package from deseq2 team

ADD REPLY • link 4.6 years ago by boaty ▴ 220

0

Entering edit mode

The authors don't cite the tool used for normalization. DEseq2 only works with genes raw counts, doesn't it? I have normalized data in FPKM. I want a package for analyse any normalized expression data, like MAS5, RSEM, FPKM, TPM...

ADD REPLY • link 4.6 years ago by LuisNagano ▴ 90

1

Entering edit mode

This has been discussed before extensively and repetitively, please use the search function. Start from this one: https://support.bioconductor.org/p/102551/ and from there please google around. You'll find pretty much the same answer that the limma-based strategy suggested there is probably the best possible but still bad solution to what you aim to do, as FPKM is not suited for differential analysis. Further details on why that is can be found in numerous threads here, on BioC and the web.

ADD REPLY • link 4.6 years ago by ATpoint 82k

1

Entering edit mode

look at this first tximport tximport will take normalised count and length information to recompute raw count...... you can use tximport to get pseudo raw count then use deseq2 for GDE It works for almost all the modern count quantification tools like kallisto, stringtie. But you need to know which tool is used for gene quantification

ADD REPLY • link 4.6 years ago by boaty ▴ 220

1

Entering edit mode

No, that is not true and not recommended. tximport aggregates transcript abundance estimates to the gene level and corrects for average transcript length, it does not do any magic to save you from inferior normalization techniques like FPKM. The transcript information is already lost in FPKM as in most cases this is already the gene level count, therefore tximport would be meaningless. If possible, download the raw data from NCBI or ENA and obtain raw counts. Everything else is inferior. Relying on prenormalized counts where (as OP states) the method section lacks details about the pipeline is not reproducible and therefore IMHO not recommended, beyond the issue that FPKM is a poor choice for normalization.

ADD REPLY • link 4.6 years ago by ATpoint 82k