Question

How to find nuclear localized genes from scRNA-seq UMI data?

0

Entering edit mode

18 months ago

Emily ▴ 70

Is there a way to process nuclear localised vs cytosol localised genes from anndata? The scanpy tutorial shows how to do it for mitochondrial ones using adata.var['mt'] = adata.var.index.str.startswith('MT-') and for ribosomes I did

ribo_url = "http://software.broadinstitute.org/gsea/msigdb/download_geneset.jsp?geneSetName=KEGG_RIBOSOME&fileType=txt"

ribo_genes = pd.read_table(ribo_url, skiprows=2, header = None) ribo_genes

adata.var['ribo'] = adata.var_names.isin(ribo_genes[0].values)

but not sure how I can go about separating nuclear vs cytosolic...

scanpy anndata scRNA python rna • 497 views

ADD COMMENT • link updated 18 months ago by Matthias Zepper 4.6k • written 18 months ago by Emily ▴ 70

1

Entering edit mode

I think, you have to clarify here what you mean by nuclear localised vs cytosol localised genes. Do you mean the localization of the genes itself or the localization of the transcribed and potentially translated gene products? And what organism(s) are you working with?

Because if I interpret your code snippets correctly, you are aiming for the gene products and many mitochondrial proteins are actually transcribed in the nucleus, so e.g. your approach to just select those that originate from the mitochondrial DNA falls short.

Using Gene Ontology for annotation might give you an idea, since provide a set of hierarchical controlled vocabulary split into 3 categories: Biological process, Molecular function, Cellular component including evidence codes.

ADD REPLY • link 18 months ago by Matthias Zepper 4.6k