Getting mitochondrial genes for scRNAseq
2
0
Entering edit mode
3.8 years ago
m-harbus • 0

Every paper I come across seems to have a cut-off for mitochondrial gene %. My question is how do they get this percentage? Is there some sort of list with all of the MT genes?

RNA-Seq rna-seq scRNAseq single-cell • 6.8k views
ADD COMMENT
4
Entering edit mode
3.8 years ago

Hi,

The list of mitochondrial genes is obtained after align and annotate the reads of the scRNA-seq data. After this step you get a table usually with genes annotated on the rows and cells on the columns. Each gene contains a number of reads aligned against itself across each cell.

Depending if you're working with mouse or human, all mitochondrial genes start with the prefix mt- or MT-, respectively. So, programmatically is quite easy to determine the no. or percentage of mitochondrial genes among the genes expressed per cell (genes > 0 reads).

So if you have a table like follows:

           cell_1  ....
gene_1     0 
gene_2     2390
gene_3     9009 
gene_4     2839
gene_5     293
mt-gene_6  8239
mt-gene_7  0
mt-gene_8  23
mt-gene_9  0
mt-gene_10 0
...

Above you have 10 genes (50% mitochondrial and 50% nuclear genes), though only 6 are expressed (2 mitochondrial and 4 nuclear genes). This means that you have for cell_1 2/6 genes are mitochondrial: ~33% of the genes are mitochondrial.

Regarding the threshold of percentage of mitochondrial genes to "discard" a cell, it depends on the study. Usually, I see 5-10%, i.e., if a cell has more than 5-10% of mitochondrial genes, you discard that cell because it can represent a dead cell or a cell whose membrane leak the mRNA but not the mitochondrial mRNA. Though some cell populations just have high metabolic rates, and, therefore high no. of mitochondria and mitochondrial genes. In this case you don't want to discard cells with a high percentage of mitochondrial genes because it represents a true cell population that you want to study. That said, you need to be very careful with thresholds. You shouldn't trust on them blindly. You should always plot the data and see how it looks and, of course it depends on your study.

I hope this answers your question.

António

ADD COMMENT
0
Entering edit mode

Thank you very much Antonio. So basically, any gene that has the prefix MT- is assumed to be mitochondrial. That's all I was wondering.

ADD REPLY
1
Entering edit mode

That depends on the annotation file you are using but if that is what your file has then yes.

ADD REPLY
0
Entering edit mode

Yeah I've noticed a difference. I'm using Alevin (python) and the output from my FASTQ files returns the columns as ENSEMBL ID, which I've converted to their gene symbols accordingly with the mygene library. However, they use the notation of MT only without the dash

ADD REPLY
1
Entering edit mode

Yes.

For instance one of the software used to analyze scRNA-seq data on R detects the percentage of mitochondrial genes by:

pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")

As you can see on the option/parameter pattern the user needs to specify which is the pattern to search for mitochondrial genes. Of course as @genomax said, it depends on the annotation file used.

António

ADD REPLY
0
Entering edit mode

That is what the Seurat tutorial suggests: https://satijalab.org/seurat/v3.1/pbmc3k_tutorial.html

ADD REPLY
0
Entering edit mode
3.1 years ago

For droplet based approaches, I would not use any preset number as a threshold for %mt genes. I would consider the percentage of reads mapped to mitochondrial genes, the number of transcripts and genes detected in that cell or cluster to guide me in setting a threshold for mitochondrial genes. If the percentage of mitochondrial genes are high and a low number of genes and transcripts are detected, this shows that the cell might have been lysed before entering the droplet, which means its mRNA content leaked and because the mitochondria are too large to go out through the pores, they are principally the component that got caputred in the droplet.

You can read more here HBC training

ADD COMMENT
1
Entering edit mode

There is a new pre-print from Hippen et al. that proposes an adaptive probabilistic approach to define QC thresholds.

It seems mitochondrial fraction works fairly well as a cutoff regardless of the number of detected genes:

plot

ADD REPLY
0
Entering edit mode

That would still qualify the cell (=the droplet) as poor quality, so where is the difference in using mt% alone?

ADD REPLY

Login before adding your answer.

Traffic: 2529 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6