Question

What are important questions before starting analysis of 10x scRNA-seq dataset

0

Entering edit mode

3.0 years ago

mi • 0

I am supposed to analyze some 10x chromium single cell gene expression data set in the future.

There will be plenty of time to familiarize with the task and to discuss the matter with experts in the field. I myself have years of experience in genomics/transcriptomics but not with single-cell expression data.

However, I need to evaluate the quality of the data and the amount of work needed for the task in the next days.

So what are the questions you would ask about a 10x chromium single cell gene expression data set before deciding if you take over the task and to see how much work it will be? What to consider?

10x scRNA-seq • 1.6k views

ADD COMMENT • link updated 3.0 years ago by GenoMax 141k • written 3.0 years ago by mi • 0

score 3 · Answer 1 · 2021-05-12

As with any new analysis expect to spend some time familiarizing yourself with the procedure involved. 10x makes a set of tools available for doing the analysis. You can find them at their support site. I am going to link GEX site but they have other protocols. They also have tutorials and test data you can download.

On open source side of things:

STARsolo (LINK)
alevin-fry (LINK)
OSCA - Orchestrating Single-Cell Analysis with Bioconductor
Seurat (LINK - getting started)

You will likely choose one from 1/2 and 3/4 above. You are going to be spending a few days for sure on learning plus doing the actual analysis. There are many existing threads here to refer to and a few experts who will be able to answer questions.

score 3 · Answer 2 · 2021-05-12

As far as QC goes, you probably want to look at how much of the data is mitochondrial in origin (could indicate dead or lysed cells), look at the distribution of number of genes per cell and UMI counts and maybe run DoubletFinder. You don't say what type of tissue or species, but sometimes HBB/HBA genes from red blood cells can give you a feel for the background (ambient) RNA contamination or "soup" in the data. Or sometimes these will just form a cluster.

Otherwise, I think walking through the appropriate Seurat tutorials might give you some more ideas. If it's "low quality" are you not going to analyze it?

Oh, and make a note of your versions of everything when you start or otherwise find a way to keep software consistent. In my experience, the data analysis on these projects sometimes outlasts new versions of R, Seurat, etc. You don't want the UMAP and clustering to change six months in due to some underlying software update.