Critique or Feedback of filtering strategy for LIANA+ results using spatially informed bivariate metrics
1
0
Entering edit mode
5 hours ago

Hello, I hope everyone is doing well! I was wondering if I can get some feedback/critique of how I am going about filtering results obtained from LIANA+. I have looked through previous literature and did not find a protocol or method for filtering that I thought would be applicable to spatially informed results (most papers that cite LIANA use single cell and not spatial data. I also noticed most papers mainly use LIANA results as validation rather than for hypothesis generation). I am working with a single cell binned (with bin2cell) VisiumHD data and my team has manually annotated our cell types across our 21 slides/samples/fields of view (FOVs). I used LIANA+'s spatially informed bivariate function to infer intercellular ligand-receptor interactions. I went with LIANA+'s base parameters, except I set nz_prop, i.e. non-zero proportion (the minimum proportion of cells that express a ligand/receptor pair) to 0.005 given the sparsity of our data. Here's what my function call looks like (mind you that its originally wrapped in a for loop to run the algo per FOV):

lrdata_sample = li.mt.bivariate(ad_sample,
                                resource_name='consensus', 
                                local_name='cosine', # local function
                                global_name="morans", # global function
                                n_perms=100, # Number of permutations to calculate a p-value against null distribution
                                mask_negatives=False, 
                                add_categories=True,
                                nz_prop=0.005,
                                use_raw=False
                                verbose=True
                                )

Once results were calculated across my whole data, I then applied the following logic for filtering ligand-receptor pairs:

1) Filter globally: Filter out lig-rec pairs across all FOVs based on global Moran's R values and significance

  • All pairs with Moran's R > 0 and a morans_pvals < alpha = 0.05
  • Rationale: I would be selecting for ligand-receptor pairs that largely display spatial co-clustering

Here's the code I used for global filtering:

def filter_global_metric(global_res_df, metric = 'morans', alpha = 0.05, top_n_pairs = 100, morans_ctf = 0.01):

    #filter out lig-rec pairs that are not spatially coclustered (i.e. moran's R value > 0) and statistically insignificant (i.e. moran's  pvalue > 0.05)
    coclust_res0 = global_res_df[global_res_df[f'{metric}_pvals'] < alpha]
    coclust_res = coclust_res0[coclust_res0[metric] > morans_ctf]
    top_pairs = coclust_res.sort_values(metric, ascending = False).head(top_n_pairs)

    return top_pairs, top_pairs.interaction

2) Filter locally: this step is applied when looking at cell type specific lig-rec pairs. I subset the data for my population of interest, for example a cluster of tumor cells. I then stratified the subset by tumor cells that are surrounded by high density of other tumor cells (tumor_dense) and tumor cells that are sparely surrounded by other tumor cells (tumor_sparse), ex. cells that are on the border of the tumor.

  • Actual local filtering logic: for all globally filtered lig-rec pairs, if this lig-rec pair is statistically significant (alpha = 0.05) for at least 0.1 of the cells in this group and it is categorized as a "High-High" interaction by LIANA+ for at least 0.1 of the cells in the group (i.e. expression of the ligand and the receptor is highly expressed according to local spatially weighted cosine similarity values) then it is kept for visualization
  • Rationale: the globally filtered results are enriched for spatially co-clustered lig-rec pairs, so we just need to see which of these pairs are actually of significance to a given cell type population.

Here's my function for local filtering:

def filter_local_metric(global_filt_pairs, niche_lr_data, alpha = 0.05, min_sig_hit_prop = 0.25, min_hh_prop = 0.25): 
    local_sig_pairs = []

    for pair in global_filt_pairs.values:
        pvals_vect = niche_lr_data[:, pair].layers['pvals'].toarray().flatten()
        cats_vect = niche_lr_data[:, pair].layers['cats'].toarray().flatten()
        # if this lig-rec pair is locally significant for at least n% of the niche/celltype, 
        # then check if the proportion of cells within that group for which the lig-rec pair has a 'High-High' category of interaction
        sig_hit_prop = (pvals_vect < alpha).sum() / len(pvals_vect)
        if sig_hit_prop >= min_sig_hit_prop:
            hi_hi_prop = (cats_vect == 1).sum() / len(cats_vect)
            if hi_hi_prop >= min_hh_prop:
                local_sig_pairs.append(pair)

    return local_sig_pairs

Issues:

  • I am quite worried about my local filtering approach. The sparsity of the data requires me to set such low thresholds for filtering. I am worried that I may not be using the best possible approach.
  • Batch effects are something to consider. We have quite a bit of heterogeneity in the quality of our data across all FOVs. In previous analyses, we used scanpy's ligrec function to generate results and we would filter pairs based on how often they significantly occurred across FOVs as a method informed by batch effects.
  • I lack the foresight or knowledge to truly assess the fidelity of this filtering method I would greatly appreciate any feedback/critiques. This project is my first foray into spatial transcriptomics and even though its been half a year, I am still not up to par in terms of knowledge or intuition in addressing this.
LIANA spatial-transcriptomics visiumHD • 99 views
ADD COMMENT
1
Entering edit mode

I am quite worried about my local filtering approach.

Please do not worry about anything.

ADD REPLY
1
Entering edit mode
2 hours ago

Your global filtering approach using Moran's R greater than 0.01 and p-values less than 0.05 selects for ligand-receptor pairs with positive spatial autocorrelation, which aligns with identifying co-clustered interactions in spatial data. However, Moran's R values close to 0.01 indicate weak clustering, so consider raising this threshold to 0.1 if your data permits, to prioritize stronger signals. Selecting the top 100 pairs by Moran's R is arbitrary; base this on a distribution of R values or biological relevance instead.

For local filtering, requiring significance in at least 25% of cells and High-High categories in at least 25% is a valid way to focus on relevant interactions within cell subpopulations. The low thresholds reflect data sparsity, but they risk including noise. To improve fidelity, adjust thresholds dynamically based on cell group size or use false discovery rate correction on p-values to account for multiple tests. Your stratification into dense and sparse tumor regions adds useful context, but ensure these definitions are consistent across fields of view.

Regarding batch effects, your previous method with Scanpy's ligrec—filtering by recurrence across fields of view—is robust and should be integrated here. For example, require globally filtered pairs to appear significant in at least 50% of fields of view before local analysis. This reduces heterogeneity from data quality variations.

Overall, your method is logical for hypothesis generation in spatial transcriptomics, but validate filtered pairs against known biology or orthogonal data like protein expression.

Kevin

ADD COMMENT
0
Entering edit mode

Thank you so much for the feedback, I really appreciate it! I will definitely incorporate these suggestions, especially the one with filtering global results based on FOV appearance!

ADD REPLY

Login before adding your answer.

Traffic: 3946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6