Question

Hemoglobin genes highly expressed in one replicate sample (RBC contamination) - what can I do?

0

Entering edit mode

7.6 years ago

terkild • 0

Hi,

I have a RNA-seq dataset with 28 samples (divided into 14 conditions with 2 replicates each). These samples were generated from Flow Cytometry sorted cells from murine immune tissue. Initial analysis of my samples revealed that one of my replicates had vastly different expression of a few genes as compared to the other samples (including the other sample within the same condition; sample 6 in the picture below).

heatmap showing genes distinctly expressed by sample 6 as compared to the sample 5

Further analysis of the distinctly expressed genes (Alas2, Ppbp, Pf4, Gypa, Hbb-bs, Gda, Hba-a2, Hba-a1, Hbb-bt, Apol11b) showed a bias toward Hemoglobin and platelet-associated genes indicating that this pattern is created by contamination of red blood cells (RBC) and possibly platelets. As these RBCs lack a nucleus i assume that they contain a sparse repertoire of mRNA molecules and thus hope it may be possible to correct my contaminated sample.

Would correcting this be a bad idea? I can use the data without correction, but I feel that my normalized counts may be slightly off due to the "bias" in this sample.

If OK to correct, what approach would you recommend? I could just remove the genes (and assigned reads) from all samples in my analysis before normalization (as I am not particularly interested in Hemoglobin gene expression). Would this be a viable approach?

RNA-Seq • 2.2k views

ADD COMMENT • link 7.6 years ago by terkild • 0