Appropriate RPKM cutoff
3 months ago
Lilla


I'm using multiple previously published RNA-Seq studies as validation and to search for similar "signatures" as in our data. For these other studies I have their final read counts, and statistically significant filtered data that includes RPKM, FPKM, or other normalized read values as per their publications.

My question is that I used a cut-off of RPKM > 1 and FPKM > 1 to say whether or not a gene is expressed in the respective study. A reviewer has now responded that this is not informative and the cut-off is too low. In at least two of the studies they used this same cutoff to say whether or not a gene is expressed. I'm wondering what is a more appropriate cut-off to claim whether or not a gene is expressed? Should it be then higher RPKM cutoff and say >= 10 reads per gene (???).

My supervisors are of no help in this and I don't have help in my lab regarding this. :(

3 months ago
dsull

There is no good "cutoff" for what is "expressed" (and even the word "expressed" is kind of vague -- if a read mapping perfectly+unambiguously to a gene is "present" in your reads, do you use its "presence" to indicate meaningful biological expression?).

But you can cite copy number results from literature, e.g. "a transcript of 1 RPKM corresponds to approximately one transcript per cell" in one particular case -

^Old paper (actually it's the paper that "invented" RNA-seq) -- and mostly outdated with respect to sequencing advances -- but it might be what you need to cite to convince reviewers that, although there's no good cutoff for "expression", people have been using 1 RPKM = 1 transcript copy number per cell in certain cases.


