How To Deal With Duplicate Row Names Error In R
1
11
Entering edit mode
9.7 years ago
Diana ▴ 900

Hi all,

I'm facing a very annoying error in R while assigning row names to my data matrix. I have some RNA-seq data that I'm considering clustering in R. I'm using gene names as row names for my expression matrix but it keeps reporting that there are duplicate names. Some un-annotated genes have been assigned with some IDs that start with numbers. I don't understand how to deal with this error? Is there a way to work around it? because I cant change the gene names.

EDIT:

gene                sample1    sample2    sample3
Mar-01                4.19504    3.9006    4.15683
Mar-02                3.0554    3.4261    3.76675
un_A_2                1.1515    1.2455    0.563484
un_A_3                98.2504    120.341    101.753
ENSGALG00000008227    39.6383    12.8651    38.2281
ENSGALG00000008242    5.71557    7.79314    9.40917
ENSGALG00000008277    24.6231    28.3207    24.9288
CNN3                141.708    134.476    144.514
CNNM1                0.840218 0.963683 0.619086
CNNM2                16.0282    12.1301    12.4665


Many thanks.

r • 175k views
8
Entering edit mode
39
Entering edit mode
9.7 years ago

One way of dealing with this is in R is the function make.names with the option unique=TRUE, see ?make.names.

> nams = c("bl-a","bl-a","bl-a", "foo" )
> df = data.frame(matrix (1:4))
> df
matrix.1.4.
1           1
2           2
3           3
4           4
> rownames(df) = nams
Error in row.names<-.data.frame(*tmp*, value = value) :
duplicate 'row.names' are not allowed
non-unique value when setting 'row.names': ‘bl-a’
> rownames(df) = make.names(nams, unique=TRUE)
> df
matrix.1.4.
bl.a             1
bl.a.1           2
bl.a.2           3
foo              4

0
Entering edit mode

what could the possible solution if I need to add the value and row name should remain the same. e.g instead

bl.a 1 bl.a.1 2 bl.a.2 3 f>oo 4 I need bl.a 5 (adding the first 3 rows with same name) foo 4

2
Entering edit mode

I am not sure if I understand correctly, but it is best to have unique row names. If you need the original info, you could make a data frame with an additional column containing the unmodified gene names.

It is possible to turn check.names off when constructing the data, but I wouldn't recommend it.

0
Entering edit mode

I know this is an old thread, but there doesn't seem to be a good answer. Looking at the example, the names are unique and yet the error is still occurring. I had a similar issue, and the problem was that I had a special character in the name. The special character was handled fine by many r functions but not all. After a lot of troubleshooting, I replaced the special character and this resolved the issue.

0
Entering edit mode

make.names should also deal with special characters just fine (by replacing them with "." ) I assumed the example was just an excerpt, and therefor the duplicated names might have been elsewhere.

rownames(x) <- make.names(c("a§", "a€", "aअअअ", "aअअअ"),  unique =  T)
> x
[,1]
a.        1
a..1      2
aअअअ      3
aअअअ.1    4


Seems to somehow even work with extended unicode, while those characters will not be replaced with dots.