Question

Best way to use champ.impute or impute.knn - neighbor approach?

0

Entering edit mode

6.5 years ago

Mathias ▴ 90

Hi all, I'm both new to the forums, and new to methylation analysis

I'm currently exploring the functions of ChAMP, which seems very useful. I want to impute some missing values in my dataset - which is a beta value matrix and an additional sample sheet, and I'd like to know how champ.impute() works. From looking at the documentation I guess these are important arguments: pd=myLoad$pd, k=5, method="Combine"

So when I look at the function of champ.impute at: https://github.com/Bioconductor-mirror/ChAMP/blob/master/R/champ.impute.R I gather that myLoad$pd is updated after removing 'valid columns', so I need my rows in the same order as my beta value matrix. K is used to select the number of neighbours impute.knn will use if method is combine or knn.

So after digging deeper into impute.knn(), from the documentation: For each gene with missing values, we find the k nearest neighbors using a Euclidean metric.

So actually the neighbors are the calculated neighbors, not the one I specified through a certain 'phenotype' or 'sample_group' column in the myLoad$pd file.

Is there a way to impute using neighbors from the same sample group using ChAMP? Would this be biologically sound, or is the calculated neighbor approach better?

ChAMP impute R • 1.8k views

ADD COMMENT • link 6.5 years ago by Mathias ▴ 90