Question

Single Cell RNA Seq. Analysis

7

Entering edit mode

5.5 years ago

saqlain ▴ 90

Hi all, I am new to the BioInformatics, and quite a beginner in programming languages. Can anyone suggest me some sources where I can at least learn 50% of the scRNA seq data analysis? I am familiar with C language, and I know a little bit of molecular biology too.

RNA-Seq • 6.0k views

ADD COMMENT • link updated 5.1 years ago by kgosche ▴ 30 • written 5.5 years ago by saqlain ▴ 90

0

Entering edit mode

Thank You all. I am highly indebted, thank you very much. I have started working whatever you all have suggested, and any further help will be equally appreciated.

ADD REPLY • link 5.5 years ago by saqlain ▴ 90

2

Entering edit mode

5.1 years ago

kgosche ▴ 30

Partek Flow is a point-and-click analysis software for single-cell data. You can work with data from any platform and perform QA/QC, filtering, normalization, clustering, visualization, classification, statistical analysis, pathway analysis and so on. It has lots of online documentation as well as tech support so it really is easy to use. Here's information about its single-cell capabilities.

ADD COMMENT • link 5.1 years ago by kgosche ▴ 30

0

Entering edit mode

It's also not free, so keep that in mind. Licenses are not cheap.

ADD REPLY • link 5.1 years ago by jared.andrews07 ★ 17k

score 18 · Accepted Answer · 2019-04-22

The Hemberg Lab has a very useful introduction to scRNA-seq analysis using R.

In terms of R packages and actually understanding why the objects are becoming fairly complex, the explanations from the bioconductor people are fairly insightful and the principles are applicable to Seurat, too (although the names of the accessor functions for, say, retrieving the matrix of read counts, will be different). The accompanying book is here.

Generally, there are these steps that the analysis will involve:

Read alignment (FASTQ --> BAM), depends a bit on the type of data you have, for data from the 10X Genomics platform, they offer their CellRanger software, but there are other tools like alevin and STARsolo. This step is usually done for all NGS data, but it is slightly more complicated for single-cell data because the tools need to keep track of where each read came from (which cell and which transcript, if UMI were used)
Count matrix generation: The first major goal is to obtain a matrix of read counts per gene, where rows usually correspond to genes and columns to cells. For single-cell RNA-seq, this is usually part of the alignment step.
Filtering, Normalization, Batch correction, ...: this is where scRNA-seq becomes really frustrating, even for experienced bioinformaticians because there's no real consensus yet as to how scRNA-seq data is properly normalized. This is why many people will point you to Seurat, which pretends it has it all figured out by providing functions that are aptly named NormalizeData and ScaleData and if your data looks similar to what people have been working with, the default settings may work.
Dimensionality reduction: tSNE, PCA, UMAP, ... These are techniques to allow you to represent the data in a xy-coordinates ( = 2 dimensions) rather than the original number of dimensions your count matrix will have (probably something like 30 000 genes x 10 000 cells).
Clustering cells: usually done with graph-based methods because they seem to offer the best compromise between speed and accuracy for single-cell data, there are a couple of excellent reviews on the topic: Menon 2018, Kiselev 2019, and some benchmarking papers: Duo 2019, Freytag 2019
Assigning labels to cells: this is usually the main goal of many scRNA-seq data sets these days and it's usually quite tricky, but in principle, we're expecting to see certain genes that are only expressed in certain clusters of cells (marker genes) and based on those we try to infer the "cell type". While not very technical in nature, I found the discussions by Jesse Gilles and Meghan Crow (here and here) quite insightful.

In short, as other have pointed out, scRNA-seq is really not ideal to start out as a bioinformatician because it's a fairly new data type and we're still grappling with all its intricacies and caveats. That being said, you may find more automated solutions like the one provided by the EPFL (asap) useful to play around with some data, just be cautious with making bold interpretations and claims.

score 6 · Accepted Answer · 2019-04-05

6

Entering edit mode

5.5 years ago

GenoMax 146k

Comprehensive collection of all things single cell (including tutorials) : https://github.com/seandavi/awesome-single-cell

ADD COMMENT • link 5.5 years ago by GenoMax 146k

0

Entering edit mode

Thank you. Can you also suggest me some material where I can learn how to implement statistics?

ADD REPLY • link 5.5 years ago by saqlain ▴ 90

0

Entering edit mode

how to implement statistics?

What do you mean by that?

ADD REPLY • link 5.5 years ago by GenoMax 146k

0

Entering edit mode

Don't implement statistics yourself. There is specialized software for all common (sc)RNA-seq analysis, please use google and the search fuction. Seurat is a good starting point, as mentioned above.

ADD REPLY • link 5.5 years ago by ATpoint 84k

score 6 · Accepted Answer · 2019-04-05

Single-cell data are rather unpleasant as a beginner's topic due to the noisy and sparse nature of these data. Maybe better first analyze some bulk RNA-seq data to get familiar with R (see here), and then dive into the documentation of Seurat which is the jack-of-all-trades in terms of scRNA-seq analysis. For lowlevel processing alevin is a good choice.

score 6 · Accepted Answer · 2019-04-11

6

Entering edit mode

5.5 years ago

Bogdan ★ 1.4k

Yes, Seurat would be one of the starting points. Beside the tutorials offered on Seurat web site, a while I have posted some R code on Seurat github page : https://github.com/satijalab/seurat/issues/1193 (hope it is helpful)

ADD COMMENT • link 5.5 years ago by Bogdan ★ 1.4k

Ram · Accepted Answer · 2019-04-11

4

Entering edit mode

5.5 years ago

Fidel ★ 2.0k

Scanpy has good tutorials that can help you.

ADD COMMENT • link updated 5.5 years ago by Ram 44k • written 5.5 years ago by Fidel ★ 2.0k