Question

Cufflinks and fpkm

1

Entering edit mode

6.1 years ago

qudrat ▴ 100

Hello guys, Can somebody suggest me how to choose cut off fpkm value for a list transcripts generated from Cufflinks.

RNA-Seq gene • 2.7k views

ADD COMMENT • link updated 6.1 years ago by Kevin Blighe 87k • written 6.1 years ago by qudrat ▴ 100

1

Entering edit mode

Here we go again:

You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

Please stop using Tophat https://t.co/Es4ohxOEyx Cole and I developed the method in *2008*. It was greatly improved in TopHat2 then HISAT & HISAT2. There is no reason to use it anymore. I have been saying this for years yet it has more citations this year than last #methodsmatter
— Lior Pachter (@lpachter) December 2, 2017

ADD REPLY • link 6.1 years ago by WouterDeCoster 47k

score 4 · Answer 1 · 2018-03-17

4

Entering edit mode

6.1 years ago

Kevin Blighe 87k

There's no correct answer. Also, you should not be using Cufflinks - it has been updated to StringTie'. Moreover, FPKM is not an ideal expression unit to use in terms of comparisons across samples. If you are analysing TCGA data, you neither have to use the FPKM counts, in most cases, as the raw count HTseq files are available for download.

With that, my swift answer is to eliminate genes whose mean FPKM value are below 10. However, due to the fact that there is no cross-sample normalisation performed when deriving FPKM counts, a value of 10 means different things in different samples.

ADD COMMENT • link 5.5 years ago by Kevin Blighe 87k

0

Entering edit mode

An update (6th October 2018):

You should abandon RPKM / FPKM. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis:

Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

The Total Count and RPKM [FPKM] normalization methods, both of which are still widely in use, are ineffective and should be definitively abandoned in the context of differential analysis.

Also, by Harold Pimental: What the FPKM? A review of RNA-Seq expression units

The first thing one should remember is that without between sample normalization (a topic for a later post), NONE of these units are comparable across experiments. This is a result of RNA-Seq being a relative measurement, not an absolute one.

ADD REPLY • link 5.2 years ago by Kevin Blighe 87k