Question

Transcripts highly constitutively expressed

0

Entering edit mode

4.5 years ago

nidhi.vijayan13 ▴ 30

How do I identify transcripts that are NOT differentially expressed? I have RNA-seq data that I have assembled using trinity. I want to see which transcripts are highly expressed in all replicates and between 2 different types of tissues.

RNA-Seq rna-seq • 716 views

ADD COMMENT • link updated 4.5 years ago by ATpoint 81k • written 4.5 years ago by nidhi.vijayan13 ▴ 30

0

Entering edit mode

Thanks for all the suggestions! Will try the p-value, log fold change and replicating with published results. If anyone is familiar with a good paper that does something similar and share it, I would really appreciate it! THANKS much

ADD REPLY • link 4.4 years ago by nidhi.vijayan13 ▴ 30

score 2 · Answer 1 · 2019-11-04

2

Entering edit mode

4.5 years ago

ATpoint 81k

Do standard differential expression analysis and select genes with high average expression (in edgeR that would be the logCPM column of the output) and strong evidence against differential expression (p-value > 0.95 or even 0.99). If differential expression analysis is new to you please read the edgeR or DESeq2 manual carefully and then come back if there are specific questions. Both are pretty extensive (the edgeR one maybe even more than the one from DESeq2) and offer a lot of example code to get started.

ADD COMMENT • link 4.5 years ago by ATpoint 81k

0

Entering edit mode

What about picking genes with low p-values and very small fold changes?

ADD REPLY • link 4.5 years ago by swbarnes2 14k

0

Entering edit mode

I guess to get confident results you would need a large number of replicates. Otherwise small FCs and low p-values are somewhat mutually exclusive, correct me if I am wrong? The null hypothesis (by default) is that the true fold change is zero. I am basically thinking aloud as I never looked for non-DE genes specifically or how non-significant p-values behave depending on dispersion and fold-changes. It is probably true that large p-values could also indicate large variation between replicates and therefore do not confidently identify non-DEGs. Still, I am reasonable certain that very large p-values suggest a strong evidence against rejection of the null hypothesis so in turn suggest that the true FC is close to zero. As said, correct me if I am wrong, I am not a statistician.

ADD REPLY • link 4.5 years ago by ATpoint 81k

1

Entering edit mode

I agree with this. P values basically tell you about evidence against the null.

There's also power analysis: if you know what your statistical power is (which is non-trivial to estimate), you can have an idea of your chances of committing a false negative (type 2 error) -- thinking a gene is non-DE when it actually is DE.

If a gene is truly non-DE, you're not going to see p < 0.05 even with a huge number of replicates because you're really never going to get enough evidence to reject the null.

Anyhow, just my 2 cents. I'd go with ATpoint's recommendation (higher average expression and a high p-value threshold means more power and therefore less chance of getting false negatives).

ADD REPLY • link 4.5 years ago by dsull ★ 5.8k

1

Entering edit mode

For additional confidence, you might also download published RNA-seq data that are reasonably comparable and then try to define a set of genes that reproducibly (among datasets) are highly-expressed and always have strong evidence against DEG. Depends probably on your exact scientific question but similar findings in independent datasets are among my favourits to increase confidence in my results, beyond statistical strategies to filter results within one dataset.

ADD REPLY • link 4.5 years ago by ATpoint 81k