How To Deal With Duplicate Row Names Error In R
1
11
Entering edit mode
9.7 years ago
Diana ▴ 900

Hi all,

I'm facing a very annoying error in R while assigning row names to my data matrix. I have some RNA-seq data that I'm considering clustering in R. I'm using gene names as row names for my expression matrix but it keeps reporting that there are duplicate names. Some un-annotated genes have been assigned with some IDs that start with numbers. I don't understand how to deal with this error? Is there a way to work around it? because I cant change the gene names.

EDIT:

gene                sample1    sample2    sample3
Mar-01                4.19504    3.9006    4.15683
Mar-02                3.0554    3.4261    3.76675
un_A_2                1.1515    1.2455    0.563484
un_A_3                98.2504    120.341    101.753
ENSGALG00000008227    39.6383    12.8651    38.2281
ENSGALG00000008242    5.71557    7.79314    9.40917
ENSGALG00000008277    24.6231    28.3207    24.9288
CNN3                141.708    134.476    144.514
CNNM1                0.840218 0.963683 0.619086
CNNM2                16.0282    12.1301    12.4665

Many thanks.

r • 175k views
ADD COMMENT
8
Entering edit mode
ADD REPLY
39
Entering edit mode
9.7 years ago

One way of dealing with this is in R is the function make.names with the option unique=TRUE, see ?make.names.

> nams = c("bl-a","bl-a","bl-a", "foo" )
> df = data.frame(matrix (1:4))
> df
  matrix.1.4.
1           1
2           2
3           3
4           4
> rownames(df) = nams
Error in `row.names<-.data.frame`(`*tmp*`, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘bl-a’ 
> rownames(df) = make.names(nams, unique=TRUE)
> df
       matrix.1.4.
bl.a             1
bl.a.1           2
bl.a.2           3
foo              4
ADD COMMENT
0
Entering edit mode

what could the possible solution if I need to add the value and row name should remain the same. e.g instead

bl.a 1 bl.a.1 2 bl.a.2 3 f>oo 4 I need bl.a 5 (adding the first 3 rows with same name) foo 4

ADD REPLY
2
Entering edit mode

I am not sure if I understand correctly, but it is best to have unique row names. If you need the original info, you could make a data frame with an additional column containing the unmodified gene names.

It is possible to turn check.names off when constructing the data, but I wouldn't recommend it.

ADD REPLY
0
Entering edit mode

I know this is an old thread, but there doesn't seem to be a good answer. Looking at the example, the names are unique and yet the error is still occurring. I had a similar issue, and the problem was that I had a special character in the name. The special character was handled fine by many r functions but not all. After a lot of troubleshooting, I replaced the special character and this resolved the issue.

ADD REPLY
0
Entering edit mode

make.names should also deal with special characters just fine (by replacing them with "." ) I assumed the example was just an excerpt, and therefor the duplicated names might have been elsewhere.

rownames(x) <- make.names(c("a§", "a€", "aअअअ", "aअअअ"),  unique =  T)
> x
   [,1]
  a.        1
  a..1      2
  aअअअ      3
  aअअअ.1    4

Seems to somehow even work with extended unicode, while those characters will not be replaced with dots.

ADD REPLY

Login before adding your answer.

Traffic: 680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6