Question: Are 2 replicates per sample sufficient for RNA-seq data analysis?
1
gravatar for ag1805x
10 months ago by
ag1805x120
India
ag1805x120 wrote:

I am learning to analyse RNA-seq. I sourced a data set from ENA and tried to use new tuxedo suite to identify DE genes. After analysis I find that the q values are all above 0.5. Also they are same. p-values seems good. I got 973 genes where p<0.05. The highest fold change observed is 15919921.96. Is this normal?

rna-seq ngs R • 711 views
ADD COMMENTlink modified 10 months ago • written 10 months ago by ag1805x120
2

The highest fold change observed is 15919921.96. Is this normal?

This is highly suspicious.

ADD REPLYlink written 10 months ago by WouterDeCoster35k

You should not worry too much about the these absolutely highest FCs in NGS. These are typically outliers due to any kind of bias, be it PCR/amplification errors, alignment errors (low-complexity regions) or similar. If these FC exceed, lets say the 99th quantile, you might even consider to either discard or winsorize them.

ADD REPLYlink modified 10 months ago • written 10 months ago by ATpoint11k
2

Wow, does that fold change break the World record?

ADD REPLYlink written 10 months ago by Kevin Blighe33k
2

The million dollar question! Two are definitely not enough, but sometimes we have to deal with it and use only those. Three are usually considered a good number, but statistically speaking you'd need 20-30 ^^

ADD REPLYlink written 10 months ago by Macspider2.6k
1

I dont think so - sounds quite strange. Did you look at the p-value distribution? Lastly to answer the question usually at least 3 replicates are recommended.

ADD REPLYlink written 10 months ago by kristoffer.vittingseerup1.0k

Yes even I do agree atleast 3 replicates are better, but the data set I am working on has only 2.

Please have a look at the p-value distribution histogram here. It seems good enough.

ADD REPLYlink modified 10 months ago • written 10 months ago by ag1805x120
1

There is a paper which suggests that 12 replicates per condition is actually the 'bare minimum' for sufficient statistical power, though that paper is probably still the only one to date which has ever done that many!

I'd echo everyone else and say you need a minimum of 3. Think of it like this, without a 3rd datapoint, it's impossible to know if one or the other of those points are anomalous. This is akin to fitting lines/polynomials to data on graphs, if you have 2 points, you'll always draw a straight line. Generally the order of the polynomial you can draw should always be 1 less than the number of points you have, and ideally the fewer the better (within reason!)

ADD REPLYlink written 10 months ago by jrj.healey9.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 808 users visited in the last hour