Question: How Do You Justify Your Rna-Seq Expression Threshold (Fpkm/Rpkm) ?
3
gravatar for repinementer
15 months ago by
repinementer700
repinementer700 wrote:

Hi, after following 4 years of literature based on RNA-Seq studies, I understood that most of the papers arbitrarily define expression threshold i.e, >1 FPKM/RPKM to identify an expressed transcript. But how can one really justify this ?

ADD COMMENTlink modified 15 months ago by Damian Kao9.9k • written 15 months ago by repinementer700
4
gravatar for swbarnes2
15 months ago by
swbarnes21.9k
swbarnes21.9k wrote:

Our lab uses spike-ins of some known RNA sequences, all at known concentrations. If the spike-in RPKM expression levels make sense, you have some evidence that RPKM for your transcripts at the same level are accurate.

Ambion ERCC spike-in controls is what we use.

ADD COMMENTlink written 15 months ago by swbarnes21.9k
1

I think using spike-in controls is key going forward with RNA-Seq experiments. Personally I was quite irritated with the absurdly low cut-offs ENCODE has been using for calling "novel" RNAs. Levels that frankly are reflecting noise picked up by the depth of sequencing.

ADD REPLYlink written 15 months ago by Dan Gaston3.0k

I can't agree more. However, using >1 RPKM in discovering long non-coding RNAs should be fine as they are expected to be lowly expressed.

ADD REPLYlink written 15 months ago by repinementer700
1

Depends on what you expect that >1 RPKM to work out to in terms of expected number of transcripts/cell.

ADD REPLYlink written 15 months ago by Dan Gaston3.0k
3
gravatar for SamuelL
15 months ago by
SamuelL1.1k
SamuelL1.1k wrote:

If I were you, I would make a density plot of the FPKM values you are getting, hopefully, you will get a distinct distribution and a reliable range for your cutoff.

ADD COMMENTlink written 15 months ago by SamuelL1.1k
1

Still the way you choose the cutoff after plotting them is kind of arbitrary ?

ADD REPLYlink written 15 months ago by repinementer700
1

arbitrary perhaps but at least justifiable.

ADD REPLYlink written 15 months ago by SamuelL1.1k

Based on the density plot, what's your suggestion on where to assign a threshold?

ADD REPLYlink written 7 weeks ago by daniel.bellieny0

Depends on the distribution. If you get a nice bimodal distribution, anything in between.

ADD REPLYlink written 6 weeks ago by SamuelL1.1k
3
gravatar for Mikael Huss
15 months ago by
Mikael Huss3.4k
Stockholm
Mikael Huss3.4k wrote:

Although spike-ins, as mentioned, are best, if you don't have them you could look at this paper: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000598

It outlines a procedure for setting a cutoff based on finding a good compromise between low rates of false positives and false negatives, respectively. The approach compares the observed distribution of FPKMs for transcripts in the sample with FPKMs calculated for a "negative set" of regions that lie close to annotated genes but haven't been observed to be expressed in any published experiments.

ADD COMMENTlink written 15 months ago by Mikael Huss3.4k
2

Just before posting this question, I came across this paper but I was confused with the way they define false positives/negatives.

ADD REPLYlink written 15 months ago by repinementer700
1
gravatar for Damian Kao
15 months ago by
Damian Kao9.9k
UK
Damian Kao9.9k wrote:

Using RPKM of 1 is as arbitrary as using p-value of 0.05. There are some papers that use intronic/intergenic expression as the baseline threshold. But even that can get complicated and messy.

ADD COMMENTlink written 15 months ago by Damian Kao9.9k
Please log in to add an answer.

Help
Access
  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 573 users visited in the last hour