I am trying to do PCA
on my transcriptomics
data but the design of experiment was not perfect meaning we have 2 runs for the experiment. The 1st
run has 22 records from 11 donors (before and after treatment) and the 2nd
run has 16 records from 8 donors (again before and after treatment). But the problem is:
3 of the donors in the 2nd run have incomplete data in the 1st run (including them we have 25 records for the 1st run), in fact for those donors we have only before treatment data (after treatment experiment did not go well) but for those 3 donors we have complete data (before and after treatment). In other word, for 3 individuals we have 6 records in run 2 (complete set) and 3 records in run 1 (incomplete set).
Now my question is, since I am trying to analyze data from both runs together (I will correct for the batch effect), is it correct to use the incomplete data set from those donors in addition to the complete data sets?