how to determine n_cells_by_count
0
0
Entering edit mode
9 months ago

Hello, I followed this tutorial (https://github.com/mousepixels/sanbomics_scripts/blob/main/single_cell_analysis_complete_class.ipynb ) in order to process single cell RNA-seq analysis using scanpy. For the first step data filtering I applied this script

def pp(csv_path):
    adata = sc.read_csv(csv_path).T
    sc.pp.highly_variable_genes(adata, n_top_genes = 2000, subset = True, flavor = 'seurat_v3')
    scvi.model.SCVI.setup_anndata(adata)
    vae = scvi.model.SCVI(adata)
    vae.train()
    solo = scvi.external.SOLO.from_scvi_model(vae)
    solo.train()
    df = solo.predict()
    df['prediction'] = solo.predict(soft = False)
    df.index = df.index.map(lambda x: x[:-2])
    df['dif'] = df.doublet - df.singlet
    doublets = df[(df.prediction == 'doublet') & (df.dif > 1)]

    adata = sc.read_csv(csv_path).T
    adata.obs['Sample'] = csv_path.split('_')[2] #'raw_counts/GSM5226574_C51ctr_raw_counts.csv'

    adata.obs['doublet'] = adata.obs.index.isin(doublets.index)
    adata = adata[~adata.obs.doublet]

    #sc.pp.filter_genes(adata, min_cells=3) #get rid of genes that are found in fewer than 3 cells
    adata.var['mt'] = adata.var_names.str.startswith('mt-')  # annotate the group of mitochondrial genes as 'mt'
    adata.var['ribo'] = adata.var_names.isin(ribo_genes[0].values)
    sc.pp.calculate_qc_metrics(adata, qc_vars=['mt', 'ribo'], percent_top=None, log1p=False, inplace=True)
    return adata
import os
out = []
for file in os.listdir('raw_counts/'):
    out.append(pp('raw_counts/' + file))

My question is how to do in order to add a column that includes n_cells_by_count because when I type adata.obs I have a table that includes information only on:

Sample
doublet
n_genes_by_counts
total_counts
total_counts_mt
pct_counts_mt
total_counts_ribo
pct_counts_ribo
total_counts_hb
pct_counts_hb

If I'm not wrong we need both n_cells_by_count and n_genes_by_count for data filtration. In other words we have to see also the distribution of these parameters in each sample in order to apply filters such as adata.obs['n_genes_by_counts'] < xxxxx

scanpy • 603 views
ADD COMMENT
0
Entering edit mode

Shouldn't n_cells_by_counts be in adata.var? Or am I misunderstanding your question?

Also, you can filter genes and cells with sc.pp.filter_genes and sc.pp.filter_cells. You're kind of filtering n_genes_by_count by removing doublets already.

ADD REPLY

Login before adding your answer.

Traffic: 2745 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6