Entering edit mode
11 weeks ago
shri.au
•
0
Hello everyone,
So I am currently working on differentially expression analysis using publicly available datasets. I have processed the raw reads and obtained count files. Coincidently I have datasets where some are paired end reads and few are single end.
Is it ok for me to combine all the data and go ahead with Deseq2 analysis?
Most of the literature I see do the differential expression individually for each study then do a meta-analysis by combining their p-values. I wanted to know if there is any inhibition against combining all the datasets at the count file stage.
First thing comes to my mind is how would you normalize them. https://youtu.be/TTUrtCY2k-w
Also since these are different studies there will be batch effects. One must have a good understanding of methodologies used in these different studies before attempting the combine them.
Can you post the exact design for each dataset? It depends on whether you can meaningfully regress the batch effect between the two studies and whether overall the experiment is basically the same in terms of methodology. If so you might combine it, else you are bound to meta-analysis.
The first dataset has 166 affected and 160 healthy samples, platform used for study is Illumina HiSeq 2500 and paired-end reads are produced. Second dataset too uses Illumina HiSeq 2500 platform but to produce single-end reads with 37 affected and 14 healthy samples. The experimental procedure to harvest the tissue is identical, so can we use design within DESeq2 to adjust for batch effects?
Fragments were calculated instead of reads for paired-end data.