RNA-seq ribosomal genes vs proteins
2
1
Entering edit mode
3.2 years ago
igor 13k

It seems that the definition of "ribosomal genes" in the context of RNA-seq or scRNA-seq QC can be very different. I've seen some people define them by simply grepping for ^Rp[sl] (for mouse), which are technically ribosomal proteins. GENCODE GTF has a rRNA (as well as rRNA_pseudogene and Mt_rRNA) biotype, which marks ribosomal transcripts (includes genes such as Rn5s and n-R5s*).

For the purpose of QC, is one definition (or a combination of the two) more correct? For example, the default 10x Genomics workflow doesn't include rRNA biotypes in the reference, so perhaps that's why people resort to ribosomal proteins in that case.

RNA-Seq scRNA-seq rRNA • 4.2k views
ADD COMMENT
2
Entering edit mode
3.2 years ago
ATpoint 82k

I would say that you want type==rRNA as you are interested in the fraction of unwanted structural ribosomal RNAs in your library but not the ribonucleoprotein transcripts itself which are actually informative towards metabolic and (not sure about this) cell cycle state.

It definitely makes a difference, just checked in my own data (mouse hematopoietic cells, 10X v3), plots show the % of reads aligning to the selected genes (raw non-QCed cells directly after quantification with Alevin):

enter image description here

ADD COMMENT
0
Entering edit mode

This is somewhat of a tangent, but about a third of total transcripts being ribonucleoprotein seems a bit high, no?

ADD REPLY
1
Entering edit mode

It is metabolically active and rapidly dividing cells, so I would say it is at least not unexpected. That data is what it is, I did not make it up. Just checked in some related bulk RNA-seq data, there it is 5-10% of the total counts per sample. Not sure why the scRNA-seq seems to favour these reads to that extend.

Edit: See also https://kb.10xgenomics.com/hc/en-us/articles/218169723-What-fraction-of-reads-map-to-ribosomal-proteins- Seems to be quite common to get these large percentages.

ADD REPLY
0
Entering edit mode

Good article. Didn't realize they addressed this point so well and that the fractions are consistently so high.

ADD REPLY
2
Entering edit mode
3.2 years ago
predeus ★ 1.9k

I think they QC different things - and don't necessarily correlate. Ribosomal RNA shows how much of it is captured from this particular sample - since rRNA is 99% of all cell's RNA anyway, you won't really see any changes due to cellular state, I think. Ribosomal proteins, on the other hand, change with cell type and cell state.

For a typical 3' scRNA-seq which is polyA-based, there's little value in rRNA for QC (fraction of multimapping reads can be a good proxy). However, we've just processed some 5' 10x data, and these actually have quite a lot of rRNA - sometimes as much as 40%.

ADD COMMENT
0
Entering edit mode

For Tabula Muris, they regressed out ribosomal protein fraction (source) (instead of the Seurat default mitochondrial gene fraction). That was a big consortium, so there was likely an extended discussion about every step. Thus, it may also be used as a quality metric.

ADD REPLY
1
Entering edit mode

I've just been putting together a typical vignette of PBMC 10k dataset from 10x, and decided to run a quick check. Here's ribosomal protein distribution. Some cell types clearly have more ribosomal proteins than others. I think it's quite bad to regress out the ribosomal fraction, at least as a general rule.

(I thought I can just copy and paste images here? Quite annoying..)

enter image description here

ADD REPLY

Login before adding your answer.

Traffic: 1990 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6