Greetings all,
I'm writing here since inferCNV is in hiatus and so I'm unable to get an answer from their Github.
I'm currently working with multiple Visium 10X human datasets obtained from the same organ and with available histological annotation. For a single dataset, I'm running the analysis with these settings:
> inferCNV_obj = infercnv::run(inferCNV_obj,
cutoff = 0.1,
out_dir = outdir,
window_length = 101,
cluster_by_groups = TRUE,
HMM = TRUE,
analysis_mode = "sample",
HMM_report_by = "cell",
denoise = TRUE)
I attach the heatmaps of both residual expression and HMM predictions. While the latter is fairly reasonable (actual CNVs are only found in tumoral regions and are consistent with prior biological knowledge), the former shows a situation in which the residual expression of benign and stroma spots is scattered, while residual exp of tumoral spots is pronounced only for those regions where a CNV is predicted. At first I thought it could be a reference problem, since I'm using benign spots from other Visium 10X (still from the same organ, by the way) that also have low CNV burden after visual inspection: but the same problem arises even with no reference and even when I set stromal cells of the same dataset - alone or with benign spots also - as reference.
So, my guess now is that it could depend on the differences in library size among spots: since inferCNV relies on the assumption "more expression=more copies", lots of genes may have not been detected due to PCR amplification bias even if expressed. To me, it's just weird that tumoral spots don't show any residual expression along the genome while the other spots do. This is a problem when it comes to calculate a CNV score, either per-chromosome or per-genome.
Any other interpretation on this is highly appreciated!
Attaching a screenshot for ready reference. Full size PDF linked above.
Posted as an issue to InferCNV already: https://github.com/broadinstitute/infercnv/issues/698