I am looking to analyse some pre-processed infinium 450k methylation data (GSE65820) - values have been normalised and undergone QC and removal of batch effects.
I have used GEOquery to download the series matrix file
GSE65820 <- getGEO("GSE65820", GSEMatrix=TRUE)
From this I have extracted metadata about the samples using the pData() function, selecting columns that are relevant such as sample ID, cell type and tissue type. I have also added in the presence or absence of a gene amplification of interest which was determined in a separate analysis.
I have tried to perform PCA to see if samples vary in methylation based on cell type (normal vs tumour) and I also want to compare tumour samples that contain the amplification or not.
The command I tried was
pca <- prcom(t(exprs(GSE65820))
but I get back an error
Error in svd(x, nu = 0, nv = k) : infinite or missing values in 'x'
any(is.na(exprs(GSE65820))) comes back TRUE and any(is.infinite(exprs(GSE65820))) comes back FALSE
How should I deal with these missing/NA values without losing more data than I have to /without losing the whole row/column/sample ?
Before this I tried to run
corMatrix <- cor(exprs(GSE65820), use="c")
But I got
Error in cor(exprs(GSE65820), use = "c") : no complete element pairs
This field of data and analysis is new to me so Im not sure where the problems arise/ how they can be fixed
From that I get:
pca <- prcomp(na.omit(t(exprs(GSE65820[[1]]))))
then if you did not parse the list