High Malat-1 expression in single cell data
3
3
Entering edit mode
7 weeks ago

I know that Malat-1 expression is an indicator of dying cells. Would it be reasonable to filter cells with high Malat-1 expression? Or would it be better to regress out the Malat-1 gene during scaling?

single-cell • 1.2k views
ADD COMMENT
1
Entering edit mode
7 weeks ago
ATpoint 83k

My comment is general since I've never looked at this gene specifically, but metrics of poor cell quality in my experience never come alone. If you have dying cells then they will also have a good fraction of mitochondrial reads, hence fewer other genes are detected and typically trash cells will somewhat aggregate together in a UMAP plot. If you see that suspicious cells are also high in this gene then maybe yes, filter. If only this gene indicates "dying cells" then maybe it is some other biology involved.

ADD COMMENT
0
Entering edit mode

I have noticed in articles that people define low-quality clusters and remove them from the data, but they don't exactly explain what makes these clusters low quality. Malat1 is a nuclear gene, and it shouldn't be detected in high amounts in single-cell analysis, I assume? So if a cell has a high amount of nuclear genes, does it mean that they should be discarded?

Could we say that if a cell has a lower amount of unique genes and detected RNA molecules and also has a high amount of nuclear genes, this cell's RNA molecules in the cytoplasm disappeared somehow and only nuclear genes are detected?

ADD REPLY
1
Entering edit mode
7 weeks ago
dsull ★ 6.2k

I'm assuming you read https://kb.10xgenomics.com/hc/en-us/articles/360004729092-Why-do-I-see-high-levels-of-Malat1-in-my-gene-expression-data

In my experience, Malat1 is just some weird artifact that is a highly captured gene in a huge amount of scRNA-seq datasets regardless of protocol and I get good results without doing anything about it.

And the standard for the field is: Don't do anything about it. :)

ADD COMMENT
0
Entering edit mode

When I perform differential analysis, the genes that could be considered as markers in this malat1 high cluster tend to be nuclear genes. Do you have any idea what this might mean?

ADD REPLY
1
Entering edit mode

Malat1 is a lncRNA abundant in the nucleus. I guess if Malat1 is abundant and stable, makes sense it could be detect in high amounts in scRNA-seq. scRNA-seq doesn't dissociate the nucleus (at least not completely) in many cases. There's a reason why >25% of transcripts in scRNA-seq datasets are unspliced.

The "nucleus-ness" of single cells would be an interesting technical effect to look more closely at, as it does drive clustering results -- but one shouldn't assume that "nucleus-ness" = suboptimal/dead cells.

You can consider the Malat1-high cluster an undefined cluster if you don't find it interesting or have trouble annotating it, but I wouldn't threshold on Malat1 expression since it's such an abundantly expressed gene in many cells.

ADD REPLY
1
Entering edit mode
7 weeks ago

Malat1 correlates with the intronic content and can be used as a nuclear indicator. In this preprint we discuss about this artifact and the usage of Malat1 or intronic content as quality metrics: https://www.biorxiv.org/content/10.1101/2024.04.18.590104v2

ADD COMMENT
2
Entering edit mode

My response above was actually partly inspired off of a reading of your paper :)

ADD REPLY
0
Entering edit mode

Thank you for sharing.

What are your thoughts on the genes Gm42418, AY036118, and Gm26917? Some articles suggest that clusters with an abundance of these genes have been directly removed from the data. What exactly identifies these genes as contamination? And what would it mean in a dataset where intronic reads are included?

I assume there is no way to calculate the intronic fraction of cells if only count data is available, as you mentioned in your article. So if I must stick to the Malat1 gene, in which way should I calculate the Malat1 gene? Should I score the data using the "AddModuleScore()" function, or should I only calculate the percentage of the Malat1 gene? In either case, what would be the threshold?

ADD REPLY
1
Entering edit mode

I briefly looked at those genes on a genome browser -- there are a lot of repeat elements in those genes, meaning a lot of counts assigned to them are probably spurious (for example, rRNA might map to those genes). Perhaps they might correlate with intron content because many introns have low complexity sequences? Or maybe because rRNAs are abundant in the nucleus and those are nuclear genes? You'd have to check.

ADD REPLY
0
Entering edit mode

If rRNA genes have been mistakenly assigned to these genes, could it be expected that the percentage of ribosomal genes(rpl,rps) would be low in cells overexpressing Gm42418?

ADD REPLY
1
Entering edit mode

No -- those are ribosomal proteins, not rRNAs.

ADD REPLY
1
Entering edit mode

Just using the normalized expression of Malat1 works.

ADD REPLY
0
Entering edit mode

In your paper, you mentioned that a normalization value of 0 for Malat1 is also indicative of a low-quality cell. But what if the dropout effect is at play? Let's say a cell's expression of Malat1 is equal to 0, but the unique gene and RNA count appear to be quite normal. In that case, should this cell still be removed from the data?

ADD REPLY
1
Entering edit mode

I don't think dropout is playing an important role with Malat1 since it is usually highly expressed. On the other hand, if dropout is still a concern for you, Malat1-negative cells tend to cluster, so you can remove those clusters.

ADD REPLY

Login before adding your answer.

Traffic: 1612 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6