Question: R: converting Ensembl row names to Symbol ID outputs missing values in 'row.names' are not allowed
0
gravatar for user31888
2.1 years ago by
user3188840
United States
user3188840 wrote:

I have a .csv file as follows:

,TEST1,TEST2
ENSG00000197421,2,0
ENSG00000213753,0,2
ENSG00000168746,0,2
ENSG00000261824,3,0
ENSG00000128310,1,2
ENSG00000235091,9,4

In R, I import the file like this:

 > d <- read.csv("my_file.csv", header=TRUE, row.names=1)
 > d
                TEST1 TEST2
ENSG00000197421     2     0
ENSG00000213753     0     2
ENSG00000168746     0     2
ENSG00000261824     3     0
ENSG00000128310     1     2
ENSG00000235091     9     4

Checking that I do not have any duplicates:

> rownames(d)
[1] "ENSG00000197421" "ENSG00000213753" "ENSG00000168746" "ENSG00000261824"
[5] "ENSG00000128310" "ENSG00000235091"
> colnames(d)
[1] "TEST1" "TEST2"
> any(duplicated(rownames(d)))
[1] FALSE
> any(duplicated(colnames(d)))
[1] FALSE

Load libraries:

> suppressMessages(library("AnnotationDbi"))
> suppressMessages(library("org.Hs.eg.db"))

Then try to convert my Ensembl row names to Symbol in place:

> rownames(d) <- mapIds(org.Hs.eg.db,keys=rownames(d),column="SYMBOL",keytype="ENSEMBL",multiVals="first")
Error in `row.names<-.data.frame`(`*tmp*`, value = value) : 
  missing values in 'row.names' are not allowed

NOTE: Removing the first ',' on 'my_file.csv' did not help neither.

I managed to create a new field with the converted IDs but cannot replace it to the row names:

> d$SYMBOL <- mapIds(org.Hs.eg.db,keys=rownames(d),column="SYMBOL",keytype="ENSEMBL",multiVals="first")
> d
                TEST1 TEST2    SYMBOL
ENSG00000197421     2     0     GGT3P
ENSG00000213753     0     2 CENPBD1P1
ENSG00000168746     0     2 LINC01620
ENSG00000261824     3     0 LINC00662
ENSG00000128310     1     2     GALR3
ENSG00000235091     9     4      <NA>
> d_subset <- subset(d, !is.na(d$SYMBOL))
> d_subset
                TEST1 TEST2    SYMBOL
ENSG00000197421     2     0     GGT3P
ENSG00000213753     0     2 CENPBD1P1
ENSG00000168746     0     2 LINC01620
ENSG00000261824     3     0 LINC00662
ENSG00000128310     1     2     GALR3
> rownames(d) <- d$SYMBOL
Error in `row.names<-.data.frame`(`*tmp*`, value = value) : 
  missing values in 'row.names' are not allowed

I don't get it.

R • 3.3k views
ADD COMMENTlink modified 2.1 years ago by Liun30 • written 2.1 years ago by user3188840

Here, missing values means NAs, which can not be used as row names. You need to convert them to unique names (because duplicate row names are not allowed).

ADD REPLYlink written 2.1 years ago by Charles Plessy2.6k
3
gravatar for seancho
2.1 years ago by
seancho40
seancho40 wrote:

In your last line, you're still trying to assign rownames(d) <- d$SYMBOL, and not your new d_subset.

rownames(d_subset) <- d_subset$SYMBOL should work.

Alternatively, if you wish to keep all the entries, you could retain the Ensembl names when it is not mapped:

> rownames(d) <- ifelseis.na(d$SYMBOL), rownames(d), d$SYMBOL)

> d
                TEST1 TEST2    SYMBOL
GGT3P               2     0     GGT3P
CENPBD1P1           0     2 CENPBD1P1
LINC01620           0     2 LINC01620
LINC00662           3     0 LINC00662
GALR3               1     2     GALR3
ENSG00000235091     9     4      <NA>
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by seancho40

+1 for keeping the ENSG. Thanks !

ADD REPLYlink written 2.1 years ago by user3188840

Missing a bracket. The editor does not want it...

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by user3188840
0
gravatar for user31888
2.1 years ago by
user3188840
United States
user3188840 wrote:

Sorry, but I don't see any missing values in my dataset. And I don't see any duplicates in any field neither. That's what I don't understand.

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by user3188840
ENSG00000235091     9     4      <NA>

Here, <NA> means that you have a missing value.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Charles Plessy2.6k

Actually, I forgot to use the subset of my data frame in my last piece of code

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by user3188840
0
gravatar for user31888
2.1 years ago by
user3188840
United States
user3188840 wrote:

Sorry, but I don't see any miising values in my dataset. That's what I don't understand.

ADD COMMENTlink written 2.1 years ago by user3188840
0
gravatar for Liun
2.1 years ago by
Liun30
Harbin
Liun30 wrote:

It's because not all your keys (rownames(d)) in org.Hs.eg.db.

> rownames(d)

"ENSG00000197421" "ENSG00000213753" "ENSG00000168746" "ENSG00000261824" "ENSG00000128310" "ENSG00000235091"

> intersect(rownames(d),keys(org.Hs.eg.db,"ENSEMBL"))

"ENSG00000197421" "ENSG00000213753" "ENSG00000168746" "ENSG00000261824" "ENSG00000128310"

if you run this code :

mapIds(org.Hs.eg.db,keys=rownames(d)[1:5],column="SYMBOL",keytype="ENSEMBL",multiVals="first")

it's ok without any error.

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Liun30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 819 users visited in the last hour