RNA-seq ribosomal genes vs proteins
2
1
Entering edit mode
16 months ago
igor 12k

It seems that the definition of "ribosomal genes" in the context of RNA-seq or scRNA-seq QC can be very different. I've seen some people define them by simply grepping for ^Rp[sl] (for mouse), which are technically ribosomal proteins. GENCODE GTF has a rRNA (as well as rRNA_pseudogene and Mt_rRNA) biotype, which marks ribosomal transcripts (includes genes such as Rn5s and n-R5s*).

For the purpose of QC, is one definition (or a combination of the two) more correct? For example, the default 10x Genomics workflow doesn't include rRNA biotypes in the reference, so perhaps that's why people resort to ribosomal proteins in that case.

RNA-Seq scRNA-seq rRNA • 1.5k views
2
Entering edit mode
16 months ago
ATpoint 62k

I would say that you want type==rRNA as you are interested in the fraction of unwanted structural ribosomal RNAs in your library but not the ribonucleoprotein transcripts itself which are actually informative towards metabolic and (not sure about this) cell cycle state.

It definitely makes a difference, just checked in my own data (mouse hematopoietic cells, 10X v3), plots show the % of reads aligning to the selected genes (raw non-QCed cells directly after quantification with Alevin):

0
Entering edit mode

This is somewhat of a tangent, but about a third of total transcripts being ribonucleoprotein seems a bit high, no?

1
Entering edit mode

It is metabolically active and rapidly dividing cells, so I would say it is at least not unexpected. That data is what it is, I did not make it up. Just checked in some related bulk RNA-seq data, there it is 5-10% of the total counts per sample. Not sure why the scRNA-seq seems to favour these reads to that extend.

0
Entering edit mode

Good article. Didn't realize they addressed this point so well and that the fractions are consistently so high.

2
Entering edit mode
16 months ago
predeus ★ 1.8k

I think they QC different things - and don't necessarily correlate. Ribosomal RNA shows how much of it is captured from this particular sample - since rRNA is 99% of all cell's RNA anyway, you won't really see any changes due to cellular state, I think. Ribosomal proteins, on the other hand, change with cell type and cell state.

For a typical 3' scRNA-seq which is polyA-based, there's little value in rRNA for QC (fraction of multimapping reads can be a good proxy). However, we've just processed some 5' 10x data, and these actually have quite a lot of rRNA - sometimes as much as 40%.

0
Entering edit mode

For Tabula Muris, they regressed out ribosomal protein fraction (source) (instead of the Seurat default mitochondrial gene fraction). That was a big consortium, so there was likely an extended discussion about every step. Thus, it may also be used as a quality metric.

1
Entering edit mode

I've just been putting together a typical vignette of PBMC 10k dataset from 10x, and decided to run a quick check. Here's ribosomal protein distribution. Some cell types clearly have more ribosomal proteins than others. I think it's quite bad to regress out the ribosomal fraction, at least as a general rule.

(I thought I can just copy and paste images here? Quite annoying..)