Hello, I hope everyone is doing well! I was wondering if I can get some feedback/critique of how I am going about filtering results obtained from LIANA+. I have looked through previous literature and did not find a protocol or method for filtering that I thought would be applicable to spatially informed results (most papers that cite LIANA use single cell and not spatial data. I also noticed most papers mainly use LIANA results as validation rather than for hypothesis generation). I am working with a single cell binned (with bin2cell) VisiumHD data and my team has manually annotated our cell types across our 21 slides/samples/fields of view (FOVs). I used LIANA+'s spatially informed bivariate function to infer intercellular ligand-receptor interactions. I went with LIANA+'s base parameters, except I set nz_prop, i.e. non-zero proportion (the minimum proportion of cells that express a ligand/receptor pair) to 0.005 given the sparsity of our data. Here's what my function call looks like (mind you that its originally wrapped in a for loop to run the algo per FOV):
lrdata_sample = li.mt.bivariate(ad_sample,
resource_name='consensus',
local_name='cosine', # local function
global_name="morans", # global function
n_perms=100, # Number of permutations to calculate a p-value against null distribution
mask_negatives=False,
add_categories=True,
nz_prop=0.005,
use_raw=False
verbose=True
)
Once results were calculated across my whole data, I then applied the following logic for filtering ligand-receptor pairs:
1) Filter globally: Filter out lig-rec pairs across all FOVs based on global Moran's R values and significance
- All pairs with Moran's R > 0 and a morans_pvals < alpha = 0.05
- Rationale: I would be selecting for ligand-receptor pairs that largely display spatial co-clustering
Here's the code I used for global filtering:
def filter_global_metric(global_res_df, metric = 'morans', alpha = 0.05, top_n_pairs = 100, morans_ctf = 0.01):
#filter out lig-rec pairs that are not spatially coclustered (i.e. moran's R value > 0) and statistically insignificant (i.e. moran's pvalue > 0.05)
coclust_res0 = global_res_df[global_res_df[f'{metric}_pvals'] < alpha]
coclust_res = coclust_res0[coclust_res0[metric] > morans_ctf]
top_pairs = coclust_res.sort_values(metric, ascending = False).head(top_n_pairs)
return top_pairs, top_pairs.interaction
2) Filter locally: this step is applied when looking at cell type specific lig-rec pairs. I subset the data for my population of interest, for example a cluster of tumor cells. I then stratified the subset by tumor cells that are surrounded by high density of other tumor cells (tumor_dense) and tumor cells that are sparely surrounded by other tumor cells (tumor_sparse), ex. cells that are on the border of the tumor.
- Actual local filtering logic: for all globally filtered lig-rec pairs, if this lig-rec pair is statistically significant (alpha = 0.05) for at least 0.1 of the cells in this group and it is categorized as a "High-High" interaction by LIANA+ for at least 0.1 of the cells in the group (i.e. expression of the ligand and the receptor is highly expressed according to local spatially weighted cosine similarity values) then it is kept for visualization
- Rationale: the globally filtered results are enriched for spatially co-clustered lig-rec pairs, so we just need to see which of these pairs are actually of significance to a given cell type population.
Here's my function for local filtering:
def filter_local_metric(global_filt_pairs, niche_lr_data, alpha = 0.05, min_sig_hit_prop = 0.25, min_hh_prop = 0.25):
local_sig_pairs = []
for pair in global_filt_pairs.values:
pvals_vect = niche_lr_data[:, pair].layers['pvals'].toarray().flatten()
cats_vect = niche_lr_data[:, pair].layers['cats'].toarray().flatten()
# if this lig-rec pair is locally significant for at least n% of the niche/celltype,
# then check if the proportion of cells within that group for which the lig-rec pair has a 'High-High' category of interaction
sig_hit_prop = (pvals_vect < alpha).sum() / len(pvals_vect)
if sig_hit_prop >= min_sig_hit_prop:
hi_hi_prop = (cats_vect == 1).sum() / len(cats_vect)
if hi_hi_prop >= min_hh_prop:
local_sig_pairs.append(pair)
return local_sig_pairs
Issues:
- I am quite worried about my local filtering approach. The sparsity of the data requires me to set such low thresholds for filtering. I am worried that I may not be using the best possible approach.
- Batch effects are something to consider. We have quite a bit of heterogeneity in the quality of our data across all FOVs. In previous analyses, we used scanpy's ligrec function to generate results and we would filter pairs based on how often they significantly occurred across FOVs as a method informed by batch effects.
- I lack the foresight or knowledge to truly assess the fidelity of this filtering method I would greatly appreciate any feedback/critiques. This project is my first foray into spatial transcriptomics and even though its been half a year, I am still not up to par in terms of knowledge or intuition in addressing this.
Please do not worry about anything.