vlm.adjust_totS_totU -- Input contains NaN, infinity or a value too large for dtype('float64')
1
0
Entering edit mode
21 months ago
ericrkofman ▴ 20

I am getting close to getting the velocyto pipeline working. However, I am running into some issues I think having to do with sparse data. I am not quite sure exactly what the solution would be, and would appreciate any tips/guidance on whether things are looking appropriate...

I am following along with this tutorial: https://github.com/BUStools/getting_started/blob/master/velocity_tutorial.ipynb

After this step:

vlm.score_cv_vs_mean(2000, plot=True, max_expr_avg=50, winsorize=True, winsor_perc=(1,99.8), svr_gamma=0.01, min_expr_cells=50)
vlm.filter_genes(by_cv_vs_mean=True)


I get the following graph, which doesn't look as smooth as in the example. What might this mean?

Then, after these steps (Note that I had to to use a quite high apparently 60 as the min_perc_U value, otherwise I would get complaints like "min_perc_U=0.5 corresponds to total Unspliced of 1 molecule of less. Please choose higher value or filter our these cell" ):

vlm.score_detection_levels(min_expr_counts=0, min_cells_express=0,
min_expr_counts_U=25, min_cells_express_U=20)
vlm.score_cluster_expression(min_avg_U=0.007, min_avg_S=0.06)
vlm.filter_genes(by_detection_levels=True)
vlm.normalize_by_total(plot=True, min_perc_U=60)


I get the following graph:

The step that is ultimately giving me trouble is after that:

vlm.adjust_totS_totU(normalize_total=True, fit_with_low_U=False, svr_C=1, svr_gamma=1e-04)


I get the message:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I am guessing this means I have some genes that are not highly enough expressed at either unspliced or spliced levels. Does this just mean I have to do some more filtering? Or is something about my dataset perhaps not optimal for conducting velocity analysis?

kallisto loom velocity single cell velocyto • 944 views
ADD COMMENT
0
Entering edit mode

It seems like potentially the issue might have been not having pre-filtered the sadata and uadata expression matrices to only keep cells expressing at least some minimum number of genes...

ADD REPLY
1
Entering edit mode
21 months ago
ericrkofman ▴ 20

Yes, the issue was needing to first filter uadata and sadata before combining them in the loom file.

sc.pp.filter_cells(sadata, min_genes=3)
sc.pp.filter_cells(uadata, min_genes=3)

ADD COMMENT

Login before adding your answer.

Traffic: 1537 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6