Hello there, hope all of you are fine. I do hope you are enjoying this weekend. In my little experience, I always had to deal with samples coming from different batches (i.e. coming from different hospitals or experiments done in different days). One postdoc in my lab showed me how to deal with batch effect by using SVA package. I guess it is a brilliant idea to work with that, but if I don't go wrong this is not the best tool to cope with the batch effect as, in my cases, I may know the source of confounding (in other words, I know that samples are generated in different days/ come from different hospitals).
My first question is: at first glance, by looking at the PCA plot from this experiment, how can you determine (i.e. be absolutely sure) that your samples are biased or not. How can you be absolutely sure that your samples need a correction, if they cluster as expected? and if they don't cluster as you would expect how can you know that this is not due to real biology or not? what if you are over-correcting samples and removing relevant biological data that, in turn, make impossible to determine genes that are truly changing? how can you know that?
My second question is more practical and, basically, linked to my inexperience. Given the fact I have always used this SVA package by slavishly following postdoc recommendation (with her own doubts), I would like to deal with this problem in a definite way; hopefully, by designing a model matrix. I have zero ideas on where to start and if you could help me with some advice, that would be grand! I really need your help guys, cause I am absolutely alone now and don't know who to ask. I have found this link but it seems to be a bit advanced for myself.