What is the best practice of scRNA workflow for multiple patients and samples using Seurat
1
1
Entering edit mode
3.4 years ago

Hi all,

I have a scRNA-seq dataset, which has 6 patients and each patient has 2 sample types (normal, tumor). So I have 12 folders and each folder contain its scRNA data: barcodes.tsv.zip, features.tsv.zip, matirx.mtx.tsv.zip. I would like to use this data to practice and learn the Seurat (version 4.0) r package workflow.

I would like to process all the 12 samples into cluster and downstream analysis. For the preprocessing, I now have no idea which point I should merge all the datasets.

Should I read and QC each folder (e.g patient1-normal -> patient1-tumor -> patient2-normal .... so on), and then merge all data? I am still learning the scRNA seq, I believe each data will have different number of genes and cells, I need to normalize it after merging all the data.

Thank you for your help and advice.

scRNA sing-cell RNA seurat integrate • 3.6k views
ADD COMMENT
4
Entering edit mode
3.4 years ago
ATpoint 81k

Yes, you should QC each sample individually (you can automate this of course but be sure to manually inspect each QC plot). This is at least what I do. There are approaches like 3xMAF to automatically filter data, and I use them in support of what my eyes see in the QC plots, but I personally think that the human brain is an awesome diagnostic tool so just looking at the plots makes a lot of sense to see whether the automated cutoffs are actually meaningful. With skewed data they might not be. And then for clustering you should integrate your data. Please read the Seurat integration vignette. I suggest though that you download some of the Seurat example data to practice with because with 12 samples it will take both time and memory. Better to go easy with a smaller dataset or just two patients to start with. As an alternative to Seurat with lots of code to follow is https://osca.bioconductor.org/, I recommend it as documentation is (for me) much clearer. It has a section on multi-sample integration as well.

ADD COMMENT
0
Entering edit mode

Thanks for your detailed answer, ATpoint! I will check out the https://osca.bioconductor.org/ today. Really appreciate it!

I was confused when to merge/integrate the datasets is because I found the Merge function (https://satijalab.org/seurat/v3.1/merge_vignette.html) on Seurat. I think what Merge does it to merge multiple 10X dataset into one one object and we can do the QC together. But I am wondering which way is the best practice:

  1. integrate all dataset 1st and QC all together
  2. QC each dataset and integrate all QC'd data
ADD REPLY
1
Entering edit mode

Definitely the 2nd option. Integration is already a sort of analysis step and you want corrupted, damaged cells or those with very high or low depth out of the mix before doing any kind of analysis. Check for example the scater package at Bioc, they have some good examples on how to do QC, it is similar to Seurat though, and this OSCA also covers the QC.

ADD REPLY
0
Entering edit mode

So my workflow will be

  1. QC (filter out bad quality cells) each data (patient1-condition1, patient1-condition2, patient2-condition1 ... patient6-condition2)
  2. Normalize each data (NormalizeData)
  3. Merge (Merge, not integrate) all the data (so now only one combined dataset)
  4. Scale the data (FindVariableFeatures, ScaleData)
  5. Cluster (RunPCA, FindNeighbors, FindClusters)
  6. DE ...

Is it correct? (I may need to replace Merge with its integrating function if I see batch effect across conditon/patients?

Thank you

ADD REPLY
0
Entering edit mode

Did you every get a reply to this?? I am just starting my Seurat journey and am wondering the same question! Thank you!

ADD REPLY
0
Entering edit mode

ATpoint side question, from 10x do you recommend loading the raw or the filtered set?

ADD REPLY

Login before adding your answer.

Traffic: 1971 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6