Question: Hemoglobin genes highly expressed in one replicate sample (RBC contamination) - what can I do?
gravatar for terkild
3.1 years ago by
terkild0 wrote:


I have a RNA-seq dataset with 28 samples (divided into 14 conditions with 2 replicates each). These samples were generated from Flow Cytometry sorted cells from murine immune tissue. Initial analysis of my samples revealed that one of my replicates had vastly different expression of a few genes as compared to the other samples (including the other sample within the same condition; sample 6 in the picture below).

heatmap showing genes distinctly expressed by sample 6 as compared to the sample 5

Further analysis of the distinctly expressed genes (Alas2, Ppbp, Pf4, Gypa, Hbb-bs, Gda, Hba-a2, Hba-a1, Hbb-bt, Apol11b) showed a bias toward Hemoglobin and platelet-associated genes indicating that this pattern is created by contamination of red blood cells (RBC) and possibly platelets. As these RBCs lack a nucleus i assume that they contain a sparse repertoire of mRNA molecules and thus hope it may be possible to correct my contaminated sample.

Would correcting this be a bad idea? I can use the data without correction, but I feel that my normalized counts may be slightly off due to the "bias" in this sample.

If OK to correct, what approach would you recommend? I could just remove the genes (and assigned reads) from all samples in my analysis before normalization (as I am not particularly interested in Hemoglobin gene expression). Would this be a viable approach?

rna-seq • 986 views
ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by terkild0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2631 users visited in the last hour