I have 22268 Affymetrix Probeset IDs as the rownames for my expression matrix. I want to map these to the official HUGO gene symbols. However, when I use the hgu133plus2.db and annotate packages to do this with the call
symbols <- getSYMBOL(as.character(expression.matrix[,1]), "hgu133plus2")
rownames(expression.matrix) <- as.character(symbols)
I get the following error
Error in `row.names<-.data.frame`(`*tmp*`, value = value) : duplicate 'row.names' are not allowed
In addition: Warning message: non-unique values when setting 'row.names': ‘AAGAB’, ‘AAK1’, ‘AASDHPPT’, ‘AASS’, ‘ABAT’, ‘ABCA1’, ‘ABCA2’, ‘ABCB11’, ‘ABCB6’, ‘ABCB9’, ‘ABCC1’, ‘ABCC10’, ‘ABCC3’, ‘ABCC6’, ‘ABCC8’, ‘ABCC9’, ‘ABCD1’, ‘ABCD4’, ‘ABCE1’, ‘ABCF2’, ‘ABCG1’, ‘ABHD2’, ‘ABHD5’, ‘ABHD6’, ‘ABI1’, ‘ABI2’, ‘ABLIM1’, ‘ABO’, ‘ABR’, ‘ACAA1’, ‘ACAA2’, ‘ACACA’, ‘ACACB’, ‘ACADL’, ‘ACAN’, ‘ACAP1’, ‘ACAP2’, ‘ACBD3’, ‘ACE2’, ‘ACHE’, ‘ACLY’, ‘ACO2’, ‘ACOT11’, ‘ACOT7’, ‘ACOX1’, ‘ACOX3’, ‘ACP1’, ‘ACRV1’, ‘ACSBG1’, ‘ACSL1’, ‘ACSL3’, ‘ACSL6’, ‘ACSM3’, ‘ACSM5’, ‘ACTA2’, ‘ACTB’, ‘ACTG1’, ‘ACTL6B’, ‘ACTN1’, ‘ACTN2’, ‘ACTR1A’, ‘ACTR2’, ‘ACTR3’, ‘ACTR5’, ‘ACVR1B’, ‘ADA’, ‘ADAM10’, ‘ADAM12’, ‘ADAM17’, ‘ADAM19’, ‘ADAM20’, ‘ADAM22’, ‘ADAM23’, ‘ADAM [... truncated]
I know that this is because 1893 of the probesets are NA
for the official HUGO gene symbol. Therefore, I want to know what the norm is for dealing with these genes: are they excluded or should I just retain the probeset name? Or should I use the Ensembl IDs? How can I do the latter? Please bear in mind that I am using an expression matrix and not the ExpressionSet object provided by Bioconductor. This is of necessity since what I am scripting needs to be understandable by a competent programmer who is not familiar with R and will certainly not be familiar with Bioconductor.
Also, should I convert the rownames of the expression matrix into factors or is it ok to keep them as a character vector.
My recollection is that 133 did not all map stringently to protein gene IDs anyway. Its over-or-double counting by ~2K for starters