Question

How To Deal With Duplicate Row Names Error In R

11

Entering edit mode

12.4 years ago

Diana ▴ 930

Hi all,

I'm facing a very annoying error in R while assigning row names to my data matrix. I have some RNA-seq data that I'm considering clustering in R. I'm using gene names as row names for my expression matrix but it keeps reporting that there are duplicate names. Some un-annotated genes have been assigned with some IDs that start with numbers. I don't understand how to deal with this error? Is there a way to work around it? because I cant change the gene names.

EDIT:

gene                sample1    sample2    sample3
Mar-01                4.19504    3.9006    4.15683
Mar-02                3.0554    3.4261    3.76675
un_A_2                1.1515    1.2455    0.563484
un_A_3                98.2504    120.341    101.753
ENSGALG00000008227    39.6383    12.8651    38.2281
ENSGALG00000008242    5.71557    7.79314    9.40917
ENSGALG00000008277    24.6231    28.3207    24.9288
CNN3                141.708    134.476    144.514
CNNM1                0.840218 0.963683 0.619086
CNNM2                16.0282    12.1301    12.4665

Many thanks.

r • 205k views

ADD COMMENT • link updated 2.2 years ago by zx8754 12k • written 12.4 years ago by Diana ▴ 930

8

Entering edit mode

Gene names - "Mar-01, Mar-02" seems like copy paste from Excel, watch out! http://nsaunders.wordpress.com/2012/10/22/gene-name-errors-and-excel-lessons-not-learned/ http://www.biomedcentral.com/1471-2105/5/80

ADD REPLY • link 12.4 years ago by zx8754 12k

2

Entering edit mode

2.2 years ago

George ▴ 20

I had the same error message when I tried to calculate Spearman correlation coefficient.

> any(duplicated(colnames(data_corr))) [1] FALSE
> any(duplicated(rownames(data_corr))) [1] FALSE
> Corr <- corr.test(data_corr[,-2], use = "pairwise", method="spearman", adjust = "BH", alpha = 0.05) 

> Error in `.rowNamesDF<-`(x, value = value) :    duplicate 'row.names'
> are not allowed In addition: Warning message: non-unique value when
> setting 'row.names': ‘NA-NA’

The problem was in the class of data frame.

> class(data_corr)
[1] "cast_df"    "data.frame"

And resolved by changing class.

class(data_corr) <- "data.frame"

ADD COMMENT • link 2.2 years ago by George ▴ 20

0

Entering edit mode

This SO post answer gives more details about "cast_df" class issue:

Converting matrix to dataframe : Works in one case, not another

ADD REPLY • link 2.2 years ago by zx8754 12k

score 40 · Accepted Answer · 2013-02-06

40

Entering edit mode

12.4 years ago

Michael 56k

One way of dealing with this is in R is the function make.names with the option unique=TRUE, see ?make.names.

> nams = c("bl-a","bl-a","bl-a", "foo" )
> df = data.frame(matrix (1:4))
> df
  matrix.1.4.
1           1
2           2
3           3
4           4
> rownames(df) = nams
Error in `row.names<-.data.frame`(`*tmp*`, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘bl-a’ 
> rownames(df) = make.names(nams, unique=TRUE)
> df
       matrix.1.4.
bl.a             1
bl.a.1           2
bl.a.2           3
foo              4

ADD COMMENT • link 12.4 years ago by Michael 56k

0

Entering edit mode

what could the possible solution if I need to add the value and row name should remain the same. e.g instead

bl.a 1 bl.a.1 2 bl.a.2 3 f>oo 4 I need bl.a 5 (adding the first 3 rows with same name) foo 4

ADD REPLY • link 8.8 years ago by Shahzad ▴ 30

2

Entering edit mode

I am not sure if I understand correctly, but it is best to have unique row names. If you need the original info, you could make a data frame with an additional column containing the unmodified gene names.

It is possible to turn check.names off when constructing the data, but I wouldn't recommend it.

ADD REPLY • link 8.8 years ago by Michael 56k

0

Entering edit mode

I know this is an old thread, but there doesn't seem to be a good answer. Looking at the example, the names are unique and yet the error is still occurring. I had a similar issue, and the problem was that I had a special character in the name. The special character was handled fine by many r functions but not all. After a lot of troubleshooting, I replaced the special character and this resolved the issue.

ADD REPLY • link 4.6 years ago by jrdnsnclr • 0

0

Entering edit mode

make.names should also deal with special characters just fine (by replacing them with "." ) I assumed the example was just an excerpt, and therefor the duplicated names might have been elsewhere.

rownames(x) <- make.names(c("a§", "a€", "aअअअ", "aअअअ"),  unique =  T)
> x
   [,1]
  a.        1
  a..1      2
  aअअअ      3
  aअअअ.1    4

Seems to somehow even work with extended unicode, while those characters will not be replaced with dots.

ADD REPLY • link 3.9 years ago by Michael 56k