Question: Single Cell RNA Seq. Analysis
gravatar for asifadil
18 months ago by
Baba Ghulam Shah Badshah University, Jammu, India.
asifadil60 wrote:

Hi all, I am new to the BioInformatics, and quite a beginner in programming languages. Can anyone suggest me some sources where I can at least learn 50% of the scRNA seq data analysis? I am familiar with C language, and I know a little bit of molecular biology too.

rna-seq • 2.0k views
ADD COMMENTlink modified 13 months ago by kgosche20 • written 18 months ago by asifadil60

Thank You all. I am highly indebted, thank you very much. I have started working whatever you all have suggested, and any further help will be equally appreciated.

ADD REPLYlink written 17 months ago by asifadil60
gravatar for Friederike
17 months ago by
United States
Friederike6.2k wrote:

The Hemberg Lab has a very useful introduction to scRNA-seq analysis using R.

In terms of R packages and actually understanding why the objects are becoming fairly complex, the explanations from the bioconductor people are fairly insightful and the principles are applicable to Seurat, too (although the names of the accessor functions for, say, retrieving the matrix of read counts, will be different). The accompanying book is here.

Generally, there are these steps that the analysis will involve:

  1. Read alignment (FASTQ --> BAM), depends a bit on the type of data you have, for data from the 10X Genomics platform, they offer their CellRanger software, but there are other tools like alevin and STARsolo. This step is usually done for all NGS data, but it is slightly more complicated for single-cell data because the tools need to keep track of where each read came from (which cell and which transcript, if UMI were used)
  2. Count matrix generation: The first major goal is to obtain a matrix of read counts per gene, where rows usually correspond to genes and columns to cells. For single-cell RNA-seq, this is usually part of the alignment step.
  3. Filtering, Normalization, Batch correction, ...: this is where scRNA-seq becomes really frustrating, even for experienced bioinformaticians because there's no real consensus yet as to how scRNA-seq data is properly normalized. This is why many people will point you to Seurat, which pretends it has it all figured out by providing functions that are aptly named NormalizeData and ScaleData and if your data looks similar to what people have been working with, the default settings may work.
  4. Dimensionality reduction: tSNE, PCA, UMAP, ... These are techniques to allow you to represent the data in a xy-coordinates ( = 2 dimensions) rather than the original number of dimensions your count matrix will have (probably something like 30 000 genes x 10 000 cells).
  5. Clustering cells: usually done with graph-based methods because they seem to offer the best compromise between speed and accuracy for single-cell data, there are a couple of excellent reviews on the topic: Menon 2018, Kiselev 2019, and some benchmarking papers: Duo 2019, Freytag 2019
  6. Assigning labels to cells: this is usually the main goal of many scRNA-seq data sets these days and it's usually quite tricky, but in principle, we're expecting to see certain genes that are only expressed in certain clusters of cells (marker genes) and based on those we try to infer the "cell type". While not very technical in nature, I found the discussions by Jesse Gilles and Meghan Crow (here and here) quite insightful.

In short, as other have pointed out, scRNA-seq is really not ideal to start out as a bioinformatician because it's a fairly new data type and we're still grappling with all its intricacies and caveats. That being said, you may find more automated solutions like the one provided by the EPFL (asap) useful to play around with some data, just be cautious with making bold interpretations and claims.

ADD COMMENTlink modified 17 months ago • written 17 months ago by Friederike6.2k

You may also find the slides from a class I recently taught helpful (Chapter 10 would be the one focused on scRNA-seq), as well as Lior Pachter's intro

ADD REPLYlink written 17 months ago by Friederike6.2k

I also want to point out the truly excellent write-up of numerous people involved in the development of the infrastructure for analyzing scRNA-seq using R packages hosted on bioconductor: "Orchestrating single-cell analysis"

ADD REPLYlink written 12 months ago by Friederike6.2k
gravatar for genomax
18 months ago by
United States
genomax90k wrote:

Comprehensive collection of all things single cell (including tutorials) :

ADD COMMENTlink modified 18 months ago • written 18 months ago by genomax90k

Thank you. Can you also suggest me some material where I can learn how to implement statistics?

ADD REPLYlink written 18 months ago by asifadil60

how to implement statistics?

What do you mean by that?

ADD REPLYlink written 18 months ago by genomax90k

Don't implement statistics yourself. There is specialized software for all common (sc)RNA-seq analysis, please use google and the search fuction. Seurat is a good starting point, as mentioned above.

ADD REPLYlink written 18 months ago by ATpoint39k
gravatar for ATpoint
18 months ago by
ATpoint39k wrote:

Single-cell data are rather unpleasant as a beginner's topic due to the noisy and sparse nature of these data. Maybe better first analyze some bulk RNA-seq data to get familiar with R (see here), and then dive into the documentation of Seurat which is the jack-of-all-trades in terms of scRNA-seq analysis. For lowlevel processing alevin is a good choice.

ADD COMMENTlink written 18 months ago by ATpoint39k
gravatar for Bogdan
17 months ago by
Palo Alto, CA, USA
Bogdan1.0k wrote:

Yes, Seurat would be one of the starting points. Beside the tutorials offered on Seurat web site, a while I have posted some R code on Seurat github page : (hope it is helpful)

ADD COMMENTlink written 17 months ago by Bogdan1.0k
gravatar for Fidel
17 months ago by
Fidel1.9k wrote:

Scanpy has good tutorials that can help you.

ADD COMMENTlink modified 17 months ago by RamRS30k • written 17 months ago by Fidel1.9k
gravatar for kgosche
13 months ago by
kgosche20 wrote:

Partek Flow is a point-and-click analysis software for single-cell data. You can work with data from any platform and perform QA/QC, filtering, normalization, clustering, visualization, classification, statistical analysis, pathway analysis and so on. It has lots of online documentation as well as tech support so it really is easy to use. Here's information about its single-cell capabilities.

ADD COMMENTlink modified 13 months ago • written 13 months ago by kgosche20

It's also not free, so keep that in mind. Licenses are not cheap.

ADD REPLYlink written 13 months ago by jared.andrews077.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1169 users visited in the last hour