Question: Minimum Or Optimal Rpkm Value To Find If A Transcript Is Significant
5
gravatar for Prakki Rama
7.1 years ago by
Prakki Rama2.2k
Singapore
Prakki Rama2.2k wrote:

Hello all,

Could i please know:

  1. Does a high RPKM value always report that the transcript is significant? How far is it reliable? If so, what could be an optimal RPKM value to pin point if a transcript is significant or not?

  2. Are there any other parameters to reduce number of contigs from the denovo assembly and concentrate on only significant transcripts.

Thanks in advance.

rpkm • 7.8k views
ADD COMMENTlink modified 5.5 years ago by ThePresident60 • written 7.1 years ago by Prakki Rama2.2k
2
gravatar for swbarnes2
5.5 years ago by
swbarnes25.0k
United States
swbarnes25.0k wrote:

What my lab does is we throw in ERCC spike-ins into the samples. They are poly-A sequences of known concentration. So you can look at them and if, say, samples with an RPKM of 2-10 are still behaving linearly, then it's probably safe to say that real transcripts with RPKMs that low are behaving linearly.

In my lab, with the experiments we run, and the purposes of those experiments, we've been setting a, loose cut-off at .5 RPKM, or 1, to be more stringent. But I wouldn't count on that value being necessarily applicable to your lab, or your experiments.

ADD COMMENTlink written 5.5 years ago by swbarnes25.0k
1
gravatar for Richard Smith-Unna
5.5 years ago by
UK
Richard Smith-Unna130 wrote:

It really depends what you mean by significant? Reading between the lines, it seems as though you want to try to separate 'real' contigs from assembly artefacts. If that's the case, you should think carefully before discarding transcripts with a low RPKM.

There is no minimum - a contig representing a real transcript can have very low numbers of reads mapping to it, and have an extremely low RPKM. Equally, a high RPKM doesn't guarantee that the contig represents a real transcript. We often see chimeric contigs - where fragments from two or more different transcripts have been assembled into one contig. These chimeras often have high RPKM values, even though they are artefacts.

So, the answer to your question 1 is no, high RPKM does not mean you can be confident in the transcript - it isn't reliable. Thus there is no appropriate RPKM to making such a decision.

As for question 2, it really depends what you want to do with your assembled transcripts. Are you performing differential expression? Motif discovery? Are you interested in a particular set of genes?

ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by Richard Smith-Unna130

@Richard: Understood thank you. For the question 2, Yes i just wanted to focus on only a set of sequences which are reliable for further downstream analysis like differential expression analysis. My assembly seem to be fragmented alot resulting ~100's of thousands of contigs.

ADD REPLYlink written 5.4 years ago by Prakki Rama2.2k
1
gravatar for ThePresident
5.5 years ago by
ThePresident60
Canada
ThePresident60 wrote:

Could it be safe to trace a diagram of all RPKM values (should give a normal distribution), and then say that +/-1 sigma are "average/moderately" expressed genes, up of that are highly expressed genes and down are low expressed genes. Overall, you'll have 68.2% of average expression, and 15.9% of low and 15.9% of highly expressed genes. Not really an experimental evidence (although you derive those from your data), but basically logical assumption. I doubt that throwing polyA in your RNA-seq library will give a better conclusion since those will never behave like mRNAs with all their respective complexity.

ADD COMMENTlink written 5.5 years ago by ThePresident60
1

"should give a normal distribution" <- that's a big assumption. Do you typically see that in your data? I would not bet on it.

ADD REPLYlink written 5.5 years ago by Mikael Huss4.6k

Honestly, yes. I don't know if others can confirms this, but I see it in my data. Of course, you have to log transform RPKM values otherwise the dispersion is enormous due to the extreme values. I've seen it also in at least one recent paper, but I just can't find the ref right now.

ADD REPLYlink written 5.5 years ago by ThePresident60

OK, interesting. It doesn't hold in the tissue data I am currently looking at (log FPKM values) but maybe it holds for other kinds of samples.

ADD REPLYlink written 5.5 years ago by Mikael Huss4.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2448 users visited in the last hour