Question

Mixed marker expression in Xenium data

0

Entering edit mode

4 days ago

Song ▴ 10

Hi all,

I'm analyzing Xenium data (brain tissue with the 5K Prime Mouse Brain panel). I followed the Seurat v5 pipeline (loadXenium(), removing cells with nFeature_Xenium < 10, and SCTransform normalization, PCA and Clustering). When I performed clustering, I found that some cell clusters expressed both neuronal and glial markers (and in some cases, endothelial cell markers as well). When I checked these cells in Xenium Explorer, they indeed contain transcripts from neurons, glia, and endothelial cells. I understand that with in situ data, we inevitably have some noise signal, but I'm unsure how to handle these clusters. Should I interpret them as mixed cell populations, or should I compare the expression levels of neuronal/glial/endothelial transcripts and annotate cells based on the most dominant cell type markers, while ignoring the other marker expressions?

Any suggestions would be appreciated!

Sincerely,

Song.

Clustering Xenium explorer • 347 views

ADD COMMENT • link updated 2 days ago by Kevin Blighe 89k • written 4 days ago by Song ▴ 10

1

Entering edit mode

Is your experiment on wild type mouse or do you have a disease model ? Does the segmentation of the incriminated cells seems legit on Xenium explorer ? Which type of glial cells are you talking about ? What are the genes you believe are more specific to neurons or your glial cells ? Are your mixed marker cells located everywhere on your brain or is it region specific, like a brain lesion ?

I remember having these small clusters of cells in single cell experiments with both neuronal and oligo markers, which are often marked as doublets. In a disease context, it has been shown that microglia for example are eating up myelin residues and could very well contain oligos transcripts. Neurons and oligos are also tighly connected, it is highly probable that on some positions of the sectionning some neurons and oligos are touching or even overlapping. Moreover, a Xenium slide in 10µm thickness (if I am not mistaking), in this z dimension some oligos might have processes (with oligo transcripts) coming over a neuron body. The staining only capture cell body, so a transcript from an oligo process on top of a neuron would counted in that neuron body. For the endothelial cells, I would guess that with the same reasoning, if you are overlapping tiny blood vessel.

ADD REPLY • link 4 days ago by Bastien Hervé 6.5k

0

Entering edit mode

Thank you so much for the kind and detailed response.

Yes, The sample is from a disease model, and Olig2+ (OPC) and Cx3cr1+ (microglia) cells overlapped with Gad1+ inhibitory neurons. It doesn't appear to be restricted to specific regions. For cell segmentation, the Xenium onboard method didn't seem to capture neuronal structures properly, so I performed re-segmentation using the Proseg. Considering the 10µm tissue thickness, your suggestion about multiple cells potentially overlapping in the z-dimension makes a lot of sense. I think that's likely the explanation. Thank you so much.

ADD REPLY • link 3 days ago by Song ▴ 10

score 0 · Answer 1 · 2025-11-06

Hi Song,

In spatial transcriptomics platforms like Xenium (which relies on in situ hybridization), observing clusters with mixed marker expressions across cell types (e.g., neuronal, glial, and endothelial) is indeed a common challenge. This often stems from technical artifacts rather than true biological multipotency, including:

Tissue section thickness: At ~10 µm, sections can capture overlapping cells or cellular processes in the z-dimension, leading to transcript assignment to the wrong cell body during segmentation.
Segmentation inaccuracies: The default Xenium segmentation may not perfectly delineate complex structures like neuronal dendrites or glial processes, resulting in transcript bleed-over from adjacent cells.
Ambient RNA and noise: Low-level background signals or diffusion of transcripts can contribute to spurious detections, especially in densely packed tissues like brain.
Doublets/multiplets: In disease models (as you mentioned), biological interactions—such as microglia engulfing neuronal debris—can genuinely lead to shared transcripts, but this is harder to distinguish from artifacts.

Given that you've already re-segmented with Proseg and confirmed overlaps (e.g., Olig2+ OPCs and Cx3cr1+ microglia with Gad1+ neurons) in Xenium Explorer, that's a solid step forward—Proseg often improves boundary detection in such cases.

For handling these clusters, I wouldn't outright ignore the mixed expressions, as they could reflect real biology (e.g., cell-cell interactions or transitional states), but interpreting them requires caution. Here's how I'd approach it:

Dominant marker annotation with thresholding: Calculate cell type scores using AddModuleScore() in Seurat for sets of neuronal, glial, and endothelial markers. Assign the cell to the highest-scoring type if it exceeds a threshold (e.g., >0.5 normalized score), and flag or remove those with ambiguous scores (e.g., multiple types above threshold). This way, you're acknowledging the noise but prioritizing the strongest signal.

Example in Seurat:

# Define marker lists
neuronal_markers <- c("Gad1", "OtherNeuronGenes")
glial_markers <- c("Olig2", "Cx3cr1", "OtherGliaGenes")
endothelial_markers <- c("EndoGenes")

# Add scores
seu <- AddModuleScore(seu, features = list(neuronal_markers), name = "NeuronalScore")
seu <- AddModuleScore(seu, features = list(glial_markers), name = "GlialScore")
seu <- AddModuleScore(seu, features = list(endothelial_markers), name = "EndoScore")

# Filter or annotate based on max score
seu$dominant_type <- apply(seu@meta.data[, c("NeuronalScore1", "GlialScore1", "EndoScore1")], 1, function(x) {
  types <- c("Neuronal", "Glial", "Endothelial")
  max_idx <- which.max(x)
  if (x[max_idx] > 0.5) types[max_idx] else "Ambiguous"
})

Interpret as mixed populations where appropriate: If the mixed cells are spatially clustered in regions of known cell interactions (e.g., neurovascular units or lesion sites), treat them as "mixed" or "interacting" clusters. Visualize with SpatialDimPlot() or Loupe Browser to check localization.
Additional filtering: Beyond your nFeature_Xenium < 10 cutoff, consider removing cells with unusually high total transcripts (potential doublets) or low uniqueness (e.g., entropy of marker expression). Tools like DoubletFinder (adapted for spatial) or Scrublet can help flag doublets, though they're more tuned for scRNA-seq—test them post-integration if needed.
Deconvolution for refinement: If ambiguity persists, use spot-deconvolution methods like RCTD (in Seurat) or Tangram to estimate cell type proportions per "cell" bin, treating your segmented cells as mini-spots. This can quantify the contribution of each type without forcing a single label.

For example, if integrating with a reference scRNA-seq dataset:
```
# Assuming you have a reference Seurat object 'ref'
anchors <- FindTransferAnchors(reference = ref, query = seu, normalization.method = "SCT")
predictions <- TransferData(anchorset = anchors, refdata = ref$celltype)
seu <- AddMetaData(seu, predictions)
```

Ultimately, the choice depends on your downstream goals—if you're focused on pure cell types for differential expression, annotate dominantly and filter ambiguities; if exploring interactions, keep them as a separate "mixed" category. I'd also recommend checking the 10x Genomics support resources or forums for Xenium-specific tips, as panel-specific noise patterns can vary.

If you share more details (e.g., UMAP plots or specific markers), I can refine this further.

Kevin