I'm starting a project where I'm going to gather several human public RNAseq datasets to perform a differential expression meta-analysis, the objective is to analyse multiple study to detect a signal that wouldn't be found in individual study due to low number of samples.
I will end up with a lot of samples (>500), and since I'm not a statistician I'm wondering what issues I might face with this high number.
Should I expect to gain power with a large meta-dataset? Or will mixing several studies will bring too much confounding effects?
Is there some threshold in the number of samples I should gather? maybe adding more and more will just bring noise and make the analysis more difficult?
In you opinion, will a tool like DEseq2 will be fitted to analyze this kind of large dataset? Or should I use another type of approach to detect differential expressed genes?
Thank you in advance for any of your input on this