Question: Very Large Fpkm Values For Some Transcripts: Artifacts?
gravatar for Pfs
7.4 years ago by
United States
Pfs490 wrote:

I use cufflinks to analyze a ~ 135 million reads experiments. The FPKM values vary from 0 to 100000. I did not use the -N option that sometimes can produce inflated FPKM, so I investigated some of the very large FPKM values.

The generally are associated with non-coding protein genes and have length approx 100. The number of alignments covering the regions are approx 3000.

Using the RPKm formula I cannot make sense of the large FPKM values.

Any explanation? Are these artifacts? If so, how can they be detected and filtered out?

fpkm rna • 4.1k views
ADD COMMENTlink modified 5.5 years ago by Biostar ♦♦ 20 • written 7.4 years ago by Pfs490

Have you tried writing the authors (i.e. Cole)? Typically they are responsive. Of course, if they were to respond we'd love to see the answer.

ADD REPLYlink written 7.4 years ago by seidel6.8k

If you're generally trying to quantify the abundance of a number of short transcripts, you might also try passing cufflinks the --no-effective-length-correction flag.

ADD REPLYlink written 5.5 years ago by Rob3.3k
gravatar for Obi Griffith
7.3 years ago by
Obi Griffith17k
Washington University, St Louis, USA
Obi Griffith17k wrote:

I'm not sure if your FPKM values are being calculated correctly but from looking at a lot of RNA-seq data I would say it is normal to have some very highly expressed, short, non-coding transcripts (e.g., rRNA genes).

ADD COMMENTlink written 7.3 years ago by Obi Griffith17k
gravatar for Mkd
7.2 years ago by
Mkd0 wrote:

It also depends on the tissue. Some have extremely high levels of a few mRNAs such as lens and developing RBC.

ADD COMMENTlink written 7.2 years ago by Mkd0
gravatar for Mikael Huss
7.2 years ago by
Mikael Huss4.6k
Mikael Huss4.6k wrote:

This has been observed by many users. Read, for example, this SeqAnswers thread, where Cole Trapnell also makes an appearance.

ADD COMMENTlink written 7.2 years ago by Mikael Huss4.6k
gravatar for Stevelor
7.2 years ago by
Stevelor310 wrote:

It depends on the already said it might come from rRNA...or what we observed in one of our last sequencing runs was that the Globin genes where highly expressed cause it was an whole blood sample. About 90% of all reads belong to these few globin genes....[?] So think about your sample and what kind of genes could be very higly expressed!

ADD COMMENTlink written 7.2 years ago by Stevelor310
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1292 users visited in the last hour