I am currently working on a protein expression data of breast cancer where rows are my proteins (Refseq) and columns are my samples. I have 77 cancer affected samples, 3 replicates and 3 normal samples and have a lot of missing values. My data is normalized and contains log2 iTRAQ ratios of each sample. I want to do data imputation of my data and working in R and confused about what data package should I use for the data imputation or what should be my approach towards the data as I am planning to perform gene set analysis using the GSA package in R. And can I do a PCA plot to find out how the cancer subtypes are distributed across the sample?
Thanks in advance.
There are many ways to impute, see:
Thank you will look into it.
Do you need imputation to start with? For example, there are ways of doing PCA with missing values (e.g. this paper).