Data Imputation for performing UMAP
1
0
Entering edit mode
13 months ago
jscl1n22 • 0

Hi guys!

Currently I am working on a dataset with gene ID, it’s expression values and patient IDs. I want to use the UMAP method to process the data and compare results with a previous study. That study used a K-means clustering method.

At the moment my data frame have NA and UMAP cannot process that, it expects all as numeric. I did think of replacing it with zero, however a NA is not zero. Logically NA is NA, it's not detected for some reason but it doesn't mean it didn't have any expression. Yet I cannot remove that gene ID, as it may have expressions in some patients, while some don't (colnames = Patient ID ; rownames = Gene ID).

Information on Google is very limited, however I have stumble across a relatively new imputation method called ALRA (https://www.nature.com/articles/s41467-021-27729-z), but I’m still reading about it and I am not sure if it is appropriate for my type data.

Do you guys have any suggestions?

R Imputation UMAP • 1.1k views
ADD COMMENT
1
Entering edit mode
13 months ago
Mensur Dlakic ★ 27k

Last time I checked, UMAP was working on sparse data - at least in python:

https://umap-learn.readthedocs.io/en/latest/sparse.html

I did think of replacing it with zero, however a NA is not zero. Logically NA is NA, it's not detected for some reason but it doesn't mean it didn't have any expression. Yet I cannot remove that gene ID, as it may have expressions in some patients, while some don't (colnames = Patient ID ; rownames = Gene ID).

You have already answered the question by this statement, yet you evaluate your argument in a direction that suits your purpose. Just like NA is not zero, it is not going to be any value you could impute to it either. Either you are going to be consistent (drop the missing values) even if that means losing some data, or you are going to keep all the data with an understanding that some data points will not be reliable.

ADD COMMENT
0
Entering edit mode

Mensur Dlakic thank you for your reply!

I am using R at the moment :( I couldn't find any info on that.

ADD REPLY
0
Entering edit mode

This becomes a matter of whether you are willing to expand your toolbox. It seems like UMAP can do what you want without throwing out data or imputation, but it requires you to learn something. Or you can impute the data and do it with the tools you already know how to use.

ADD REPLY
0
Entering edit mode

I am fully aware and am already reading some additional imputation algorithms. Thanks Mensur Dlakic !

ADD REPLY

Login before adding your answer.

Traffic: 3683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6