Question: How To Deal With Duplicate Row Names Error In R
gravatar for Diana
6.0 years ago by
Diana750 wrote:

Hi all,

I'm facing a very annoying error in R while assigning row names to my data matrix. I have some RNA-seq data that I'm considering clustering in R. I'm using gene names as row names for my expression matrix but it keeps reporting that there are duplicate names. Some un-annotated genes have been assigned with some IDs that start with numbers. I don't understand how to deal with this error? Is there a way to work around it? because I cant change the gene names.


gene                sample1    sample2    sample3
Mar-01                4.19504    3.9006    4.15683
Mar-02                3.0554    3.4261    3.76675
un_A_2                1.1515    1.2455    0.563484
un_A_3                98.2504    120.341    101.753
ENSGALG00000008227    39.6383    12.8651    38.2281
ENSGALG00000008242    5.71557    7.79314    9.40917
ENSGALG00000008277    24.6231    28.3207    24.9288
CNN3                141.708    134.476    144.514
CNNM1                0.840218 0.963683 0.619086
CNNM2                16.0282    12.1301    12.4665

Many thanks.

R • 84k views
ADD COMMENTlink modified 6.0 years ago by Michael Dondrup45k • written 6.0 years ago by Diana750

Gene names - "Mar-01, Mar-02" seems like copy paste from Excel, watch out!

ADD REPLYlink written 6.0 years ago by zx87546.2k

hi diana 

did you get solution for ur problem

ADD REPLYlink written 4.5 years ago by Tark50

Did you read the answer?

ADD REPLYlink written 4.5 years ago by Michael Dondrup45k
gravatar for Michael Dondrup
6.0 years ago by
Bergen, Norway
Michael Dondrup45k wrote:

One way of dealing with this is in R is the function make.names with the option unique=TRUE, see ?make.names.

> nams = c("bl-a","bl-a","bl-a", "foo" )
> df = data.frame(matrix (1:4))
> df
1           1
2           2
3           3
4           4
> rownames(df) = nams
Error in `row.names<`(`*tmp*`, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘bl-a’ 
> rownames(df) = make.names(nams, unique=TRUE)
> df
bl.a             1
bl.a.1           2
bl.a.2           3
foo              4
ADD COMMENTlink written 6.0 years ago by Michael Dondrup45k

what could the possible solution if I need to add the value and row name should remain the same. e.g instead

bl.a 1 bl.a.1 2 bl.a.2 3 f>oo 4 I need bl.a 5 (adding the first 3 rows with same name) foo 4

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Shahzad10

I am not sure if I understand correctly, but it is best to have unique row names. If you need the original info, you could make a data frame with an additional column containing the unmodified gene names.

It is possible to turn check.names off when constructing the data, but I wouldn't recommend it.

ADD REPLYlink written 2.3 years ago by Michael Dondrup45k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1422 users visited in the last hour