Question: FPKM values from normalized read counts
gravatar for Bioin
2.4 years ago by
Bioin10 wrote:

Hi Biostars,

I would like to get FPKM values from normalized read counts for that I am using this formula FPKM(Fragments Per Kilobase per Million) = [# of fragments]/([length of transcript]/1000)/([total reads]/10^6)

My problem is in gtf file most of the geneids having more than one transcript with different lengths. I am confused which one to consider for my calculation.

Any comments will be helpful for my analysis.

Thanks in advance.

rna-seq • 1.6k views
ADD COMMENTlink modified 2.4 years ago by Devon Ryan96k • written 2.4 years ago by Bioin10

Not what you are asking for, but are you sure FPKM is appropriate for your downstream application? It's often not the best normalization method.

ADD REPLYlink written 2.4 years ago by WouterDeCoster44k

An update (6th October 2018):

You should abandon RPKM / FPKM. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis:

Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

The Total Count and RPKM [FPKM] normalization methods, both of which are still widely in use, are ineffective and should be definitively abandoned in the context of differential analysis.

Also, by Harold Pimental: What the FPKM? A review of RNA-Seq expression units

The first thing one should remember is that without between sample normalization (a topic for a later post), NONE of these units are comparable across experiments. This is a result of RNA-Seq being a relative measurement, not an absolute one.

ADD REPLYlink modified 24 months ago • written 2.1 years ago by Kevin Blighe66k
gravatar for Devon Ryan
2.4 years ago by
Devon Ryan96k
Freiburg, Germany
Devon Ryan96k wrote:

There are many ways to approach this, for randomly picking one, to using the median length, to using an effective gene length. The most correct method is the latter one, which uses a weighted average of the transcripts (weighted by their relative prevalence). It's most convenient to use something like salmon or kallisto to get that sort of metric.

ADD COMMENTlink written 2.4 years ago by Devon Ryan96k

Thank you Devon Ryan.

ADD REPLYlink written 2.4 years ago by Bioin10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1577 users visited in the last hour