Hi guys!
Currently I am working on a dataset with gene ID, it’s expression values and patient IDs. I want to use the UMAP method to process the data and compare results with a previous study. That study used a K-means clustering method.
At the moment my data frame have NA and UMAP cannot process that, it expects all as numeric. I did think of replacing it with zero, however a NA is not zero. Logically NA is NA, it's not detected for some reason but it doesn't mean it didn't have any expression. Yet I cannot remove that gene ID, as it may have expressions in some patients, while some don't (colnames = Patient ID ; rownames = Gene ID).
Information on Google is very limited, however I have stumble across a relatively new imputation method called ALRA (https://www.nature.com/articles/s41467-021-27729-z), but I’m still reading about it and I am not sure if it is appropriate for my type data.
Do you guys have any suggestions?
Mensur Dlakic thank you for your reply!
I am using R at the moment :( I couldn't find any info on that.
This becomes a matter of whether you are willing to expand your toolbox. It seems like UMAP can do what you want without throwing out data or imputation, but it requires you to learn something. Or you can impute the data and do it with the tools you already know how to use.
I am fully aware and am already reading some additional imputation algorithms. Thanks Mensur Dlakic !