Entering edit mode
6.1 years ago
Pin.Bioinf
▴
340
Hello, I am doing a differential expression analysis, and first my group sequenced 8 samples of responders and non responders to a treatment, and asked that I analyze and get the DE genes (25 DE genes). After that, they sequenced 8 samples more, and now we get 4 DE genes, no gene matches the DE genes in the results for the previous batch alone.
They ask if this is normal, and I know that the more samples we have, the true DE genes will be reasured and the not really importantly DE genes are dilluted as more samples are added.
Is all this normal?
Which software are you using for finding DEGs? Maybe it would be helpful to compare different software outputs. How these samples are placed in a PCA plot? I would also look into the read count for individual replicates and see how the read counts are different in each replicate of the 4 DEGs. I use CummeRbund for such analysis (http://compbio.mit.edu/cummeRbund/manual_2_0.html).
Did you add a batch effect in the (second) model?
You mean add the batch parameter to the formula? Is that enough to solve batch effect?
It seems you have two different sequence runs, so a batch effect. In the formula yes with DEseq2. Did you explore your data? With MDS plots? Boxplots? Etc.
Yes, I did. By adding the batch information to the formula I only get 1 DE gene now!
How do the responder and non responder samples cluster in the MDS plot? Please show the plot in your question.
here
here
How to add images to a Biostars post
What is your own interpretation of the PCA? Do you expect DE genes between GOOD and BAD samples?
It does not seem like there is a difference as good and bad samples do not appear even close to each other, so I think the reason i dont get any DE genes is because there are none that are significant enough. But as I am new to this and the biologist is saying that there is no way there are no DE genes, I dont know what to do. Maybe it can also be of interest for this matter that the M per read mapped were really bad, like 0.7 M, 1, 6 M per read in many samples.
It is hard to tell what went wrong, I don't know the details (talk with the sequence facility about the quality of your data). It could be technical or real biological (I can't judge from the info you give). But from the PCA I also conclude there is not much difference between GOOD and BAD, so only a handful of (false positive) DE genes seems logical.
Thank you for all your help