I would like to combine samples from different studies to do differential gene expression. For example, paper 1 has published 10 nasal samples from healthy children and 10 nasal samples from children with COVID infections; and paper 2 from a different research group has published 10 nasal samples with influenza infections. They may or may not have used the same sequencing platform/kits and seq depths.
Can I download the samples from both papers and do the the following DEG/pathway analyses etc on the dataset:
1) healthy children from paper 1 (as Factor 1), COVID from paper 1 (as factor 2) 2) healthy children from paper 1 (as Factor 1) and influenza from paper 2 (as Factor 2)? 3) COVID from paper 1 (as Factor 1) and influenza from paper 2 (as factor 2) 4) Healthy children (factor 1) vs COVID (factor 2) vs. Influenza (factor 3)
Since I will be using the same bioinformatic tools and commands to process both sets in the same analysis and all transcripts will be aligned to the same transcript assembly, will this normalize data correctly and allow for valid comparisons?
Many thanks, Swati
Can you post a proposed design matrix with samples and group info? The way you have laid out this information is hard to follow at present.
In general, you can try to correct for batch effects, but if the batches are too different they might drown out any real signals.
Paper 2 has no healthy children (or samples)? The way you've described it, influenza is completely confounded with batch. Any differences you find could be due to influenza, or batch, or both. Without a common control between experiments, I don't know of any way to avoid this. Hopefully you would have some easy and efficient way to experimentally verify candidates in follow up experiments (because all your candidates will suffer from this problem).