vlm.adjust_totS_totU -- Input contains NaN, infinity or a value too large for dtype('float64')
6 months ago
ericrkofman ▴ 10

I am getting close to getting the velocyto pipeline working. However, I am running into some issues I think having to do with sparse data. I am not quite sure exactly what the solution would be, and would appreciate any tips/guidance on whether things are looking appropriate...

I am following along with this tutorial: https://github.com/BUStools/getting_started/blob/master/velocity_tutorial.ipynb

After this step:

vlm.score_cv_vs_mean(2000, plot=True, max_expr_avg=50, winsorize=True, winsor_perc=(1,99.8), svr_gamma=0.01, min_expr_cells=50)

I get the following graph, which doesn't look as smooth as in the example. What might this mean?


Then, after these steps (Note that I had to to use a quite high apparently 60 as the min_perc_U value, otherwise I would get complaints like "min_perc_U=0.5 corresponds to total Unspliced of 1 molecule of less. Please choose higher value or filter our these cell" ):

vlm.score_detection_levels(min_expr_counts=0, min_cells_express=0,
                           min_expr_counts_U=25, min_cells_express_U=20)
vlm.score_cluster_expression(min_avg_U=0.007, min_avg_S=0.06)
vlm.normalize_by_total(plot=True, min_perc_U=60)

I get the following graph:


The step that is ultimately giving me trouble is after that:

vlm.adjust_totS_totU(normalize_total=True, fit_with_low_U=False, svr_C=1, svr_gamma=1e-04)

I get the message:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I am guessing this means I have some genes that are not highly enough expressed at either unspliced or spliced levels. Does this just mean I have to do some more filtering? Or is something about my dataset perhaps not optimal for conducting velocity analysis?

kallisto loom velocity single cell velocyto • 339 views
It seems like potentially the issue might have been not having pre-filtered the sadata and uadata expression matrices to only keep cells expressing at least some minimum number of genes...

6 months ago
ericrkofman ▴ 10

Yes, the issue was needing to first filter uadata and sadata before combining them in the loom file.

sc.pp.filter_cells(sadata, min_genes=3)
sc.pp.filter_cells(uadata, min_genes=3)

