I have a RNA-seq dataset with 28 samples (divided into 14 conditions with 2 replicates each). These samples were generated from Flow Cytometry sorted cells from murine immune tissue. Initial analysis of my samples revealed that one of my replicates had vastly different expression of a few genes as compared to the other samples (including the other sample within the same condition; sample 6 in the picture below).
Further analysis of the distinctly expressed genes (Alas2, Ppbp, Pf4, Gypa, Hbb-bs, Gda, Hba-a2, Hba-a1, Hbb-bt, Apol11b) showed a bias toward Hemoglobin and platelet-associated genes indicating that this pattern is created by contamination of red blood cells (RBC) and possibly platelets. As these RBCs lack a nucleus i assume that they contain a sparse repertoire of mRNA molecules and thus hope it may be possible to correct my contaminated sample.
Would correcting this be a bad idea? I can use the data without correction, but I feel that my normalized counts may be slightly off due to the "bias" in this sample.
If OK to correct, what approach would you recommend? I could just remove the genes (and assigned reads) from all samples in my analysis before normalization (as I am not particularly interested in Hemoglobin gene expression). Would this be a viable approach?