Question: Relevance of RNAseq meta-analysis
gravatar for guillaume.rbt
4 weeks ago by
guillaume.rbt620 wrote:

Hi all,

I'm starting a project where I'm going to gather several human public RNAseq datasets to perform a differential expression meta-analysis, the objective is to analyse multiple study to detect a signal that wouldn't be found in individual study due to low number of samples.

I will end up with a lot of samples (>500), and since I'm not a statistician I'm wondering what issues I might face with this high number.

Should I expect to gain power with a large meta-dataset? Or will mixing several studies will bring too much confounding effects?

Is there some threshold in the number of samples I should gather? maybe adding more and more will just bring noise and make the analysis more difficult?

In you opinion, will a tool like DEseq2 will be fitted to analyze this kind of large dataset? Or should I use another type of approach to detect differential expressed genes?

Thank you in advance for any of your input on this

ADD COMMENTlink modified 4 weeks ago by Asaf6.1k • written 4 weeks ago by guillaume.rbt620

Just keep in mind that whenever you mix different datasets together, you will have batch and other confounding effects, which sometimes would be even not possible to resolve. So at least try to retrieve datasets which are very similar to each other rom the perspective of sequencing technology, instrument, read length, etc, and of course biological condition.

ADD REPLYlink written 4 weeks ago by grant.hovhannisyan1.6k

Ok thanks, I will keep in mind those constraints.

ADD REPLYlink written 4 weeks ago by guillaume.rbt620
gravatar for Asaf
4 weeks ago by
Asaf6.1k wrote:

A few thoughts:

  1. Normalization is going to be tricky. I would normalize each study by itself.
  2. I think limma would be more straight forward here after you'll have all the normalized studies and you can easily have study as a confounder
  3. I don't think you can have too many samples, just be careful to test the effect along with the p-value.

Good luck

ADD COMMENTlink written 4 weeks ago by Asaf6.1k

Thank you for your help, I will compare DEseq2 and limma to see what are the results.

ADD REPLYlink written 4 weeks ago by guillaume.rbt620
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 922 users visited in the last hour