Hi all, I have bulk RNAseq data from mouse. The RNA came from T cell subsets sorted from the lungs, spleens, and lymph nodes. We wanted to perform pairwise comparisons of the lung T cells against spleen and lymph node T cells. The issue we noticed is that there is a lot of contamination of lung-specific RNA found in the lung samples. For example, Sftpc (surfactant protein) holds a ton of reads in lung samples, and almost no reads in samples from other tissues. There are many more super highly expressed contaminating genes like this, and I'd imagine that there are contaminating genes with lower reads as well. We do have a list of genes found in lung set but not spleen or LN data sets, and it accounts for ~20% of our reads. We can remove those genes from the lung set, but then there are many other non-tissue specific genes like actin filaments that are found in both lung and spleen/LN cells that could from the same RNA contamination. Does anyone have any idea how to remove/reduce the contamination from the ambient lung RNA?


What makes you think these are contaminating reads from non-T cells rather than the result of location-specific expression? If your sort looks robust and reliable, this could be real biology poking through.

Are there any publicly available datasets containing T cell sorted from lung that you could use as a comparator?


