I use cufflinks to analyze a ~ 135 million reads experiments. The FPKM values vary from 0 to 100000. I did not use the -N option that sometimes can produce inflated FPKM, so I investigated some of the very large FPKM values.
The generally are associated with non-coding protein genes and have length approx 100. The number of alignments covering the regions are approx 3000.
Using the RPKm formula I cannot make sense of the large FPKM values.
Any explanation? Are these artifacts? If so, how can they be detected and filtered out?