What are important questions before starting analysis of 10x scRNA-seq dataset
2
0
Entering edit mode
3.0 years ago
mi • 0

I am supposed to analyze some 10x chromium single cell gene expression data set in the future.

There will be plenty of time to familiarize with the task and to discuss the matter with experts in the field. I myself have years of experience in genomics/transcriptomics but not with single-cell expression data.

However, I need to evaluate the quality of the data and the amount of work needed for the task in the next days.

So what are the questions you would ask about a 10x chromium single cell gene expression data set before deciding if you take over the task and to see how much work it will be? What to consider?

10x scRNA-seq • 1.6k views
ADD COMMENT
3
Entering edit mode
3.0 years ago
GenoMax 141k

As with any new analysis expect to spend some time familiarizing yourself with the procedure involved. 10x makes a set of tools available for doing the analysis. You can find them at their support site. I am going to link GEX site but they have other protocols. They also have tutorials and test data you can download.

On open source side of things:

  1. STARsolo (LINK)
  2. alevin-fry (LINK)
  3. OSCA - Orchestrating Single-Cell Analysis with Bioconductor
  4. Seurat (LINK - getting started)

You will likely choose one from 1/2 and 3/4 above. You are going to be spending a few days for sure on learning plus doing the actual analysis. There are many existing threads here to refer to and a few experts who will be able to answer questions.

ADD COMMENT
2
Entering edit mode

I highly recommend you read through 3 here even if you don't use Bioconductor tools for your analysis. It does a very good job of explaining why and when each step is necessary.

ADD REPLY
3
Entering edit mode
3.0 years ago

As far as QC goes, you probably want to look at how much of the data is mitochondrial in origin (could indicate dead or lysed cells), look at the distribution of number of genes per cell and UMI counts and maybe run DoubletFinder. You don't say what type of tissue or species, but sometimes HBB/HBA genes from red blood cells can give you a feel for the background (ambient) RNA contamination or "soup" in the data. Or sometimes these will just form a cluster.

Otherwise, I think walking through the appropriate Seurat tutorials might give you some more ideas. If it's "low quality" are you not going to analyze it?

Oh, and make a note of your versions of everything when you start or otherwise find a way to keep software consistent. In my experience, the data analysis on these projects sometimes outlasts new versions of R, Seurat, etc. You don't want the UMAP and clustering to change six months in due to some underlying software update.

ADD COMMENT
0
Entering edit mode

Regarding the "random" / non-deterministic elements of the analysis such as UMAPs: Be sure to set.seed() everywhere, but the linked OSCA does a good job indicating where it is key to ensure that manifolds etc look the same even if you re-run it several times. As for management / documentation of package versions I became a fan of renv, see my comment here => Installing an updated R version (>=4.0) using conda unless you anyway manage everything via a contained-based solution.

ADD REPLY

Login before adding your answer.

Traffic: 2023 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6