Question: Single cell-seq data preprocessing-How to detect the gene/transcript distribution for each single cell
gravatar for sreekalasn
9 months ago by
sreekalasn0 wrote:

Hello everyone, I have an expression matrix log TPM+1 for 14,000 cells and 23,000 genes (GSE87544). In the paper (, the authors analysed 14,000 cells and reduced the data to 3000 cells and 2000 genes, before using Seurat for cell clustering.

I am new to single cell seq and in the learning process. I would appreciate help regarding the pre-processing of single-cell seq data (or finding gene/transcript distribution as in this case), since I could not find sources discussing the data pre-processing in detail.

Thank you very much!

scrna-seq • 439 views
ADD COMMENTlink modified 9 months ago by Friederike5.3k • written 9 months ago by sreekalasn0
gravatar for Friederike
9 months ago by
United States
Friederike5.3k wrote:

A good primer about pre-processing single-cell RNA-seq is Aaron Lun's paper and the numerous simpleSingleCell vignettes (Starting from "UMI" or "Droplet-based data").

A good intro focused on QC of scRNA-seq data is also part of the scater package documentation.

ADD COMMENTlink written 9 months ago by Friederike5.3k

Thank you so much. I found these sources very useful

ADD REPLYlink written 9 months ago by sreekalasn0
gravatar for geek_y
9 months ago by
geek_y10k wrote:

2000 genes could be the most variable genes across cells which will be used for PCA and then t-SNE/UMAP.

Filtering cells should be defined in methods of the paper. Abnormally high UMI counts, high mitochondrial genes, low number of genes captured, low sequencing depths, doublets etc can be some of the reasons to filter scRNA data. It also depends on the version of Seurat.

A quick read at paper says "From the 14,000 cells analyzed, 3,319 cells have more than 2,000 genes detectable in a single cell".

Its sad that you did not keep minimal effort to read the paper you are interested in.

ADD COMMENTlink modified 9 months ago • written 9 months ago by geek_y10k

I did go through the paper multiple times. However, the authors have not described in detail how they filtered the data and found the "highly variable genes". They have referenced another article, but again, I could not understand the filtering part. Hence, I posted the question here hoping to receive some help. Thanks for your heads up on the plausible factors to filter scRNA data.

ADD REPLYlink written 9 months ago by sreekalasn0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 934 users visited in the last hour