Question: Single Cell RNA Seq. Analysis
3
gravatar for asifadil
6 weeks ago by
asifadil30
Baba Ghulam Shah Badshah University, Jammu, India.
asifadil30 wrote:

Hi all, I am new to the BioInformatics, and quite a beginner in programming languages. Can anyone suggest me some sources where I can at least learn 50% of the scRNA seq data analysis? I am familiar with C language, and I know a little bit of molecular biology too.

rna-seq • 421 views
ADD COMMENTlink modified 27 days ago by Friederike4.1k • written 6 weeks ago by asifadil30

Thank You all. I am highly indebted, thank you very much. I have started working whatever you all have suggested, and any further help will be equally appreciated.

ADD REPLYlink written 27 days ago by asifadil30
7
gravatar for Friederike
27 days ago by
Friederike4.1k
United States
Friederike4.1k wrote:

The Hemberg Lab has a very useful introduction to scRNA-seq analysis using R.

In terms of R packages and actually understanding why the objects are becoming fairly complex, the explanations from the bioconductor people are fairly insightful and the principles are applicable to Seurat, too (although the names of the accessor functions for, say, retrieving the matrix of read counts, will be different). The accompanying book is here.

Generally, there are these steps that the analysis will involve:

  1. Read alignment (FASTQ --> BAM), depends a bit on the type of data you have, for data from the 10X Genomics platform, they offer their CellRanger software, but there are other tools like alevin and STARsolo. This step is usually done for all NGS data, but it is slightly more complicated for single-cell data because the tools need to keep track of where each read came from (which cell and which transcript, if UMI were used)
  2. Count matrix generation: The first major goal is to obtain a matrix of read counts per gene, where rows usually correspond to genes and columns to cells. For single-cell RNA-seq, this is usually part of the alignment step.
  3. Filtering, Normalization, Batch correction, ...: this is where scRNA-seq becomes really frustrating, even for experienced bioinformaticians because there's no real consensus yet as to how scRNA-seq data is properly normalized. This is why many people will point you to Seurat, which pretends it has it all figured out by providing functions that are aptly named NormalizeData and ScaleData and if your data looks similar to what people have been working with, the default settings may work.
  4. Dimensionality reduction: tSNE, PCA, UMAP, ... These are techniques to allow you to represent the data in a xy-coordinates ( = 2 dimensions) rather than the original number of dimensions your count matrix will have (probably something like 30 000 genes x 10 000 cells).
  5. Clustering cells: usually done with graph-based methods because they seem to offer the best compromise between speed and accuracy for single-cell data, there are a couple of excellent reviews on the topic: Menon 2018, Kiselev 2019, and some benchmarking papers: Duo 2019, Freytag 2019
  6. Assigning labels to cells: this is usually the main goal of many scRNA-seq data sets these days and it's usually quite tricky, but in principle, we're expecting to see certain genes that are only expressed in certain clusters of cells (marker genes) and based on those we try to infer the "cell type". While not very technical in nature, I found the discussions by Jesse Gilles and Meghan Crow (here and here) quite insightful.

In short, as other have pointed out, scRNA-seq is really not ideal to start out as a bioinformatician because it's a fairly new data type and we're still grappling with all its intricacies and caveats. That being said, you may find more automated solutions like the one provided by the EPFL (asap) useful to play around with some data, just be cautious with making bold interpretations and claims.

ADD COMMENTlink modified 27 days ago • written 27 days ago by Friederike4.1k

You may also find the slides from a class I recently taught helpful (Chapter 10 would be the one focused on scRNA-seq), as well as Lior Pachter's intro

ADD REPLYlink written 27 days ago by Friederike4.1k
6
gravatar for ATpoint
6 weeks ago by
ATpoint16k
Germany
ATpoint16k wrote:

Single-cell data are rather unpleasant as a beginner's topic due to the noisy and sparse nature of these data. Maybe better first analyze some bulk RNA-seq data to get familiar with R (see here), and then dive into the documentation of Seurat which is the jack-of-all-trades in terms of scRNA-seq analysis. For lowlevel processing alevin is a good choice.

ADD COMMENTlink written 6 weeks ago by ATpoint16k
5
gravatar for genomax
6 weeks ago by
genomax67k
United States
genomax67k wrote:

Comprehensive collection of all things single cell (including tutorials) : https://github.com/seandavi/awesome-single-cell

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by genomax67k

Thank you. Can you also suggest me some material where I can learn how to implement statistics?

ADD REPLYlink written 6 weeks ago by asifadil30

how to implement statistics?

What do you mean by that?

ADD REPLYlink written 6 weeks ago by genomax67k

Don't implement statistics yourself. There is specialized software for all common (sc)RNA-seq analysis, please use google and the search fuction. Seurat is a good starting point, as mentioned above.

ADD REPLYlink written 6 weeks ago by ATpoint16k
3
gravatar for Fidel
5 weeks ago by
Fidel1.9k
Germany
Fidel1.9k wrote:

Scanpy has good tutorials that can help you.

ADD COMMENTlink modified 26 days ago by RamRS21k • written 5 weeks ago by Fidel1.9k
2
gravatar for Bogdan
5 weeks ago by
Bogdan750
Palo Alto, CA, USA
Bogdan750 wrote:

Yes, Seurat would be one of the starting points. Beside the tutorials offered on Seurat web site, a while I have posted some R code on Seurat github page : https://github.com/satijalab/seurat/issues/1193 (hope it is helpful)

ADD COMMENTlink written 5 weeks ago by Bogdan750
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1177 users visited in the last hour