Question

Data Imputation for performing UMAP

0

Entering edit mode

20 months ago

jscl1n22 • 0

Hi guys!

Currently I am working on a dataset with gene ID, it’s expression values and patient IDs. I want to use the UMAP method to process the data and compare results with a previous study. That study used a K-means clustering method.

At the moment my data frame have NA and UMAP cannot process that, it expects all as numeric. I did think of replacing it with zero, however a NA is not zero. Logically NA is NA, it's not detected for some reason but it doesn't mean it didn't have any expression. Yet I cannot remove that gene ID, as it may have expressions in some patients, while some don't (colnames = Patient ID ; rownames = Gene ID).

Information on Google is very limited, however I have stumble across a relatively new imputation method called ALRA (https://www.nature.com/articles/s41467-021-27729-z), but I’m still reading about it and I am not sure if it is appropriate for my type data.

Do you guys have any suggestions?

R Imputation UMAP • 1.6k views

ADD COMMENT • link 20 months ago by jscl1n22 • 0

score 1 · Answer 1 · 2023-03-08

1

Entering edit mode

20 months ago

Mensur Dlakic ★ 28k

Last time I checked, UMAP was working on sparse data - at least in python:

https://umap-learn.readthedocs.io/en/latest/sparse.html

I did think of replacing it with zero, however a NA is not zero. Logically NA is NA, it's not detected for some reason but it doesn't mean it didn't have any expression. Yet I cannot remove that gene ID, as it may have expressions in some patients, while some don't (colnames = Patient ID ; rownames = Gene ID).

You have already answered the question by this statement, yet you evaluate your argument in a direction that suits your purpose. Just like NA is not zero, it is not going to be any value you could impute to it either. Either you are going to be consistent (drop the missing values) even if that means losing some data, or you are going to keep all the data with an understanding that some data points will not be reliable.

ADD COMMENT • link 20 months ago by Mensur Dlakic ★ 28k

0

Entering edit mode

Mensur Dlakic thank you for your reply!

I am using R at the moment :( I couldn't find any info on that.

ADD REPLY • link 20 months ago by jscl1n22 • 0

0

Entering edit mode

This becomes a matter of whether you are willing to expand your toolbox. It seems like UMAP can do what you want without throwing out data or imputation, but it requires you to learn something. Or you can impute the data and do it with the tools you already know how to use.

ADD REPLY • link 20 months ago by Mensur Dlakic ★ 28k

0

Entering edit mode

I am fully aware and am already reading some additional imputation algorithms. Thanks Mensur Dlakic !

ADD REPLY • link 20 months ago by jscl1n22 • 0