First of, I am myself rather new to the scRNA-seq field so feel free to question what I say. I make comments on the situations I have encountered myself so far, sorrynotsorry for the wall of text that will follow:
It all depends on what eventually you want to do. If you want to get an idea on the presence of batch effects and on how strong the separation of these samples is based on the status (healthy, mild, severe) then you definitely should not use any integration procedure and rather process every samples independently but identically.
This would typically involve normalization, feature selection, merge of the selected features (=highly variable genes per sample) and then a (multibatch)PCA, maybe followed by a 2D visualization approach such as TSNE or UMAP. Code suggestions for all this can e.g. be found in the scran package and the Bioconductor single-cell workflow. Seurat for sure offers the same, but I am not a Seurat user so I cannot comment. Doing so you will get a visual idea of how your samples cluster. This should give an impression if dominant batch effects are present and/or if the clustering is rather dominated by the status (healthy, mild, severe).
Based on this you probably need to decide on how to continue which is based on the question you want to answer.
I would say integration via FastMNN, CCA or any of the other available methods is not always beneficial and should be considered with care. From what I understand it is most useful to create a unified clustering landscape in which all cells from all batches are embedded. This (again from what I understand) requires though that the overall composition between the batches is rather similar in order to get robust results. It would probably (please correct if wrong) be problematic if you have unique populations in one but not the other batches as these population might end up being forced into clusters which are not uniquely formed by these unique populations. Biological cluster heterogeneity might (in part) be lost upon using these methods. Again, please correct me if this is wrong. The integration might be desirable if the batches are very similar in terms of composition but dominated by unwanted batch effects such as samples from different days of library prep, different platforms of other less obvious effects. Still, I think integration is the less desirable the more unique the clusters in each condition (e.g. in this case here healthy, mild, severe) are.
That having said, if you observe strong clustering by condition, and this is reproducible among replicates, so all healthy, mild and severe cluster rather close respectively, then maybe the an integration where each sample is considered a unique batch (which is the default of e.g. fastMNN) might not be optimal. Rather it could be desirable to only correct for obvious batches such as the day of library prep. Again, it all depends on the question you want to answer.
Still, to come back to the actual question, as said and linked above, code for PCA and TSNE/UMAP can be found in various packages such as
scran and the Bioconductor single-cell workflow. Be sure to first QC and check samples individually before merging or integrating them as clustering differences might (will) be lost upon (blind) integration.
modified 5 weeks ago
5 weeks ago by
ATpoint ♦ 36k