Small Sample Size (2 vs 3) in RNA-Seq DGE Shows Statistically Significant Result
2
0
Entering edit mode
3.8 years ago

I have recently obtained RNA-Seq data of tumor samples from a pilot study in my lab and just finished applying Differential Gene Expression analysis on the data.


The Background: Due to some restrictions related to the fiscal year on which the funding for this experiment was budgeted, only 15 samples could be run. The person we consulted regarding the experiment advised us to use three technical replicates, something we later (after the experiment was finished) found to be unnecessary (?). As a result, our data consist of:

2 biological replicates of Condition A, each with 3 technical replicates

AND

3 biological replicates of Condition B, each with 3 technical replicates

Definition

Biological replicates: Samples from different individuals with as close as possible tumor profile and clinical confounders, each exhibiting the factor of interest, either Condition A or Condition B

Technical replicates: RNA from the same sample run on the same day (same batch)


The Result of Analysis

After summing the technical replicates' counts, using DESEQ2 we found 6 differentially expressed genes (Adj. P-value < 0.05, ) between the two conditions.

The Question

  1. From what I understand, sample size determines power, which is the probability of rejecting the null hypothesis when in fact it is false (type II error, false negative). Am I correct in assuming that sample size does not have any effect on Beta (the probability of false positive)? I have read in this forum and in some journal articles that sample size in a RNA-Seq experiment should at least be 3 vs 3.
  2. There have been talks of (A) just using this data; instead of (B) designing a larger experiment (which is obviously a more expensive option). Is option (A) still a scientifically (and statistically) valid option, considering the sample size?

Thank you for considering to answer my questions. This is my first post in this forum! I have just started working in the field and just browsing past questions on this forum has helped answered my questions on many occasions. Looking forward to contributing in the years to come.

Best regards,

Michael

RNA-Seq sample size differential expression • 3.0k views
ADD COMMENT
3
Entering edit mode
3.8 years ago

Hi Micheal,

Just to add to what Kevin said: your study is technically "okay". That is, you've not done anything wrong. Your replicates are on the low side, given that these are samples from different human patients (which introduces a lot of variability) rather than, say, a clonal cell line. However, this is reflected in the small number of DE genes you have found.

In response to your questions 1: Low powered studies DO suffer from an increased chance of false positives. This is because as the power to detect true positives does down faster than the probability of a false positive. Imagine a situation where the power to detect true positives was 0. Any hit you got then would necessarily be a false positive!

The FDR you get from a test is an estimate of the average FDR. That is at a given threshold (5%), if you repeated the experiment an infinite number of times, the fraction of false positives in each experiment averaged across all the trials would be 5%. It doesn't guarantee that the number of false positives in a single experiment is definitely 5%.

ADD COMMENT
0
Entering edit mode

Low powered studies DO suffer from an increased chance of false positives.

This is something new to me. I skimmed through some epidemiology papers after reading your answer, and now I get the feeling that they don't talk enough about this in Statistics classes. Thank you, Ian, for the amazing insight!

Cheers!

ADD REPLY
2
Entering edit mode
3.8 years ago

Hey Michael,

From my perspective, if this is just pilot data, then the current set-up is okay, but could be better. Due to the fact that biology doesn't follow rules, having more samples permits that we 'capture' the greater variability that can exist in both a normal and disease population.

As you have probably seen, some users come here to ask about 1 versus 1 comparisons, and they have no technical or biological replicates. This is statistically possible to do, but the 'generalisability' of the results of such a comparison [to a broader population] is limited.

Your work would obviously not get published in any major journal. However, if it is merely for 'hypothesis generation', then that seems fine. The idea is that a larger study will come, correct?

Kevin

ADD COMMENT
0
Entering edit mode

Yes, we are planning for a larger study hopefully. It's as you said, we are trying to formulate a more specific research question based on the results of the pilot.

Thanks, Kevin, for taking the time to answer my question.

Cheers!

ADD REPLY
0
Entering edit mode

You're very welcome

ADD REPLY

Login before adding your answer.

Traffic: 3953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6