Question: How to do FPKM differential analysis?
1
gravatar for zhaoliang0302
5 weeks ago by
zhaoliang03020 wrote:

Hello, I want to do diffrential analysis on rnaseq FPKM data (can't get count). DEseq and edgR need raw count as input, so how should I do? Is it suitable for limma package?

limma rna-seq deseq fpkm • 192 views
ADD COMMENTlink modified 4 weeks ago by kristoffer.vittingseerup2.2k • written 5 weeks ago by zhaoliang03020
2

Hi, I do not recommend to use FPKM for DE analysis. Is better to use the normalized counts (vst/vsd) generated by DESeq2. (For more info have a look in here). However, there is a function in DESeq2 called fpkm() that allow you to do that

ADD REPLYlink written 5 weeks ago by Lila M 790
1

Is better to use the normalized counts (vst/vsd) generated by DESeq2.

Better for what? DE?

However, there is a function in DESeq2 called fpkm() that allow you to do that

This is only for calculation of FPKM not for DE.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by ATpoint21k

In most published articles and studies is hardly recommended to use normalized counts rather than fpkm for DE analysis. You do not agree with that?

ADD REPLYlink written 4 weeks ago by Lila M 790

Absolutely not. Please list the respective articles. FPKM regularily fails in benchmarking studies towards differential analysis. Use the established tools to perform differential analysis. The only true advantage of per-million scaling is that you can later add new samples without that norm. counts of other samples change which is not the case with e.g. TMM from edgeR. This is why databases usually use TPM or RPKM/FPKM to illustrate counts.+

Still, it is not that per-million scaling is completely useless. If library compositions between samples do not dramatically change, a simple FPKM might be sufficient. If I am not wrong cuffdiff used FPKM as the standard normalization method and I've seen datasets where agreement between FPKM values and TMM was pretty good. Still, as there are advanced methods, why not using them. I always start from raw counts as this ensures that I can stand up for the lowlevel pipeline. if you are provided with FPKM you have no idea how they were created.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by ATpoint21k

Hi, If I understand right, you mean I can merge two FPKM data from two batches without considering the batch effect? I have two batch rnaseq FPKM data now (counts not avaliable) and I want to merge them into a whole dataset. I know Deseq can deal with it by treat batch as variable for counts data. So what about FPKM? Thanks

ADD REPLYlink written 4 weeks ago by zhaoliang03020

Hi, You can get read count or No. of mapped reads by using samtools idxstats command, where the 3rd column in the output file is the read count.

The command is :

$ samtools idxstats input.bam > output.txt

(Note: The input bam is the alignment bam file generated by any alignment tool (Hisat2) or assembly tool (Spades) which is coordinate sorted and indexed)

Hope this will help you!!

ADD REPLYlink written 5 weeks ago by krishnajandhyalaa0
2

I think that you may have mis-understood the question. The user does not want total and aligned read counts per sample. They need raw read counts per gene per sample.

ADD REPLYlink written 4 weeks ago by Kevin Blighe46k
3
gravatar for Kevin Blighe
4 weeks ago by
Kevin Blighe46k
Kevin Blighe46k wrote:

I agree with Lila - you should not perform differential expression analysis on FPKM counts. If you don't believe me, then take it from the developer of limma, where some suggestions are also made: https://support.bioconductor.org/p/56275/#56299

Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

The Total Count and RPKM [FPKM] normalization methods, both of which are still widely in use, are ineffective and should be definitively abandoned in the context of differential analysis.

Also, by Harold Pimental: What the FPKM? A review of RNA-Seq expression units

The first thing one should remember is that without between sample normalization (a topic for a later post), NONE of these units are comparable across experiments. This is a result of RNA-Seq being a relative measurement, not an absolute one.

Remarkably, I still see publications coming out where people are comparing groups of samples based on FPKM counts, even though this makes no sense. As an example, FPKM of 10 in one sample may be the equivalent of 50 in another, due to the way that FPKM counts are produced, i.e., with no cross-library / sample normalisation.

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by Kevin Blighe46k

If you cannot get counts-limma trend it is a viable approach (see the voom article by the limma authors). Please also note that the Bioconductor post you refrence only states you should not use limma-voom on FPKM values. Actually in that exact post Gordon highlights (as option 3) if you only have FPKM you can use limma-trend via a log2 transformation with a pseudocount.

If you have different library sizes you are right they are not comparable (also what the voom article shows) but then you can just do a inter-library normalization of the FPKM counts first (exactly like you are doing for count data).

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by kristoffer.vittingseerup2.2k
1

Yes, but, that is the third option in a list that is ordered "in decreasing order of desirability"

ADD REPLYlink written 4 weeks ago by Kevin Blighe46k
1
gravatar for kristoffer.vittingseerup
4 weeks ago by
European Union
kristoffer.vittingseerup2.2k wrote:

Yes you can use limma although It is not as good as if you had counts - so double check you cannot get the counts and try writing the people behind the data to see if they will provide the counts (most people are quite friendly in my experince). If you cannot get the counts you can log2 transform the FPKM values (use a pseudocount of 1) and use the limma trend approach described on page 71 of the limma vignette. For a comparison of limma-trend vs limma-voom take a look at this article.

Hope this helps

ADD COMMENTlink written 4 weeks ago by kristoffer.vittingseerup2.2k

Thanks a lot. It really helps☺

ADD REPLYlink written 4 weeks ago by zhaoliang03020
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1594 users visited in the last hour