Question

Seurat: pre-processing and differential markers

0

Entering edit mode

16 months ago

Rainer ▴ 130

I have received an scRNA-seq dataset from a fee-for-service provider that contains a pre-processed Seurat dataset in rds-format, raw fastq-files, as well as transcript counts, barcode counts and read counts. I first started to explore the already pre-processed rds-data, which contained a clustering and differential testing results obtained using the following parameters:

Clustering parameters used:

PC = 15                                 # pca dimensions to use as input
k.param = 120                           # Defines k for the k-nearest neighbor algorithm
resolution = 0.5                        # resolution parameter of the clustering
algorithm = original Louvain algorithm  # algorithm used to determine the clusters

Differential expression test parameters used:

test = wilcox             # differential expression test
min.pct = 0.25            # testing genes detected in a minimum percentage in either of the two groups of cells 
downsample = 1000         # downsample the nr of cells/cluster to a maximum 
only.pos = TRUE           # only positive markers (TRUE)
logfc = -Inf              # fold difference (natural log-scale) between the two groups of cells (-Inf=no cutoff used)
return.threshold = 0.001  # adjusted p value cutoff for differential expression

Regarding the pre-processing, the documentation points to the standard Seurat PBMC tutorial. However, I wondered whether this workflow is really up-to-date, as there is already a v2 version of the SCTransform normalization available, the clustering parameters don't seem to be justified by a Silhouette width or Elbow plot analysis, and a Poisson-based differential analysis should likely be better than the standard non-parametric Wilcoxon test, which does not exploit prior distribution assumptions. If I want to pre-process the data myself, should I read in the the transcript counts, read counts and/or the barcode counts? Can I then simply pass the counts to the CreateSeuratObject (CreateSeuratObject(counts = counts, assay = "RNA", project="project_name", min.cells = 3, min.features = 200")) and then merge the Seurat objects created for all input files?

Regarding the pre-processing, I would like follow the newer framework from here with the V2 regularization, unless you are aware of any superior workflow.

For the differential analysis, I would like to exploit suitable distribution assumptions, e.g., using the Poisson test, but noted in the documentation that this test is only applicable to UMI datasets - how can I find out whether the already preprocessed data is a UMI dataset?

When I ran some test analyses with the FindMarker function on the already pre-processed data using the Poisson test, and then created Violin plots for the top-ranked genes, I noticed that the average log. fold-changes stated in the results from the FindMarkers function did not always match with the differences in the median expression visible in the Violin plots, although the assay parameter was identical for the FindMarkers- and the VlnPlot-function - do you know what could be the reason for this?

Many thanks for your help and suggestions.

pre-processing Seurat • 532 views

ADD COMMENT • link updated 16 months ago by Ram 43k • written 16 months ago by Rainer ▴ 130