Dealing with missing (NA) values in my 450K methylation array dataset
1
0
Entering edit mode
7 weeks ago
Edward E-B • 0

I am looking to analyse some pre-processed infinium 450k methylation data (GSE65820) - values have been normalised and undergone QC and removal of batch effects.

GSE65820 <- getGEO("GSE65820", GSEMatrix=TRUE)


From this I have extracted metadata about the samples using the pData() function, selecting columns that are relevant such as sample ID, cell type and tissue type. I have also added in the presence or absence of a gene amplification of interest which was determined in a separate analysis.

I have tried to perform PCA to see if samples vary in methylation based on cell type (normal vs tumour) and I also want to compare tumour samples that contain the amplification or not.

The command I tried was

pca <- prcom(t(exprs(GSE65820))


but I get back an error

Error in svd(x, nu = 0, nv = k) : infinite or missing values in 'x'

any(is.na(exprs(GSE65820))) comes back TRUE and any(is.infinite(exprs(GSE65820))) comes back FALSE

How should I deal with these missing/NA values without losing more data than I have to /without losing the whole row/column/sample ?

Before this I tried to run

corMatrix <- cor(exprs(GSE65820), use="c")


But I got

Error in cor(exprs(GSE65820), use = "c") : no complete element pairs

This field of data and analysis is new to me so Im not sure where the problems arise/ how they can be fixed

minfi PCA expression Methylation R • 369 views
1
Entering edit mode
7 weeks ago
Basti ★ 1.4k

You could omit the NA values : pca <- prcomp(na.omit(t(exprs(GSE65820))))

0
Entering edit mode

From that I get:

Error in svd(x, nu = 0, nv = k) : a dimension is zero

0
Entering edit mode

pca <- prcomp(na.omit(t(exprs(GSE65820[[1]])))) then if you did not parse the list