How to do the data imputation of my protein expression data(values)

0

Entering edit mode

4.7 years ago

koushikayaluri ▴ 70

I am currently working on a protein expression data of breast cancer where rows are my proteins (Refseq) and columns are my samples. I have 77 cancer affected samples, 3 replicates and 3 normal samples and have a lot of missing values. My data is normalized and contains log2 iTRAQ ratios of each sample. I want to do data imputation of my data and working in R and confused about what data package should I use for the data imputation or what should be my approach towards the data as I am planning to perform gene set analysis using the GSA package in R. And can I do a PCA plot to find out how the cancer subtypes are distributed across the sample?

Thanks in advance.

Regards

rna-seq R next-gen alignment gene • 1.2k views

ADD COMMENT • link 4.7 years ago by koushikayaluri ▴ 70

2

Entering edit mode

There are many ways to impute, see:

CRAN Task View: Missing Data

ADD REPLY • link 4.7 years ago by zx8754 11k

0

Entering edit mode

Thank you will look into it.

ADD REPLY • link 4.7 years ago by koushikayaluri ▴ 70

1

Entering edit mode

Do you need imputation to start with? For example, there are ways of doing PCA with missing values (e.g. this paper).

ADD REPLY • link 4.7 years ago by Jean-Karim Heriche 27k

Login before adding your answer.