Question: How To Deal With Duplicate Row Names Error In R
3
gravatar for Diana
4.6 years ago by
Diana730
Germany
Diana730 wrote:

Hi all,

I'm facing a very annoying error in R while assigning row names to my data matrix. I have some RNA-seq data that I'm considering clustering in R. I'm using gene names as row names for my expression matrix but it keeps reporting that there are duplicate names. Some un-annotated genes have been assigned with some IDs that start with numbers. I don't understand how to deal with this error? Is there a way to work around it? because I cant change the gene names.

EDIT:

gene                sample1    sample2    sample3
Mar-01                4.19504    3.9006    4.15683
Mar-02                3.0554    3.4261    3.76675
un_A_2                1.1515    1.2455    0.563484
un_A_3                98.2504    120.341    101.753
ENSGALG00000008227    39.6383    12.8651    38.2281
ENSGALG00000008242    5.71557    7.79314    9.40917
ENSGALG00000008277    24.6231    28.3207    24.9288
CNN3                141.708    134.476    144.514
CNNM1                0.840218 0.963683 0.619086
CNNM2                16.0282    12.1301    12.4665

Many thanks.

R • 54k views
ADD COMMENTlink modified 4.6 years ago by Michael Dondrup43k • written 4.6 years ago by Diana730
5

Gene names - "Mar-01, Mar-02" seems like copy paste from Excel, watch out! http://nsaunders.wordpress.com/2012/10/22/gene-name-errors-and-excel-lessons-not-learned/ http://www.biomedcentral.com/1471-2105/5/80

ADD REPLYlink written 4.6 years ago by zx87543.7k

hi diana 

did you get solution for ur problem

ADD REPLYlink written 3.1 years ago by Tark30

Did you read the answer?

ADD REPLYlink written 3.1 years ago by Michael Dondrup43k
19
gravatar for Michael Dondrup
4.6 years ago by
Bergen, Norway
Michael Dondrup43k wrote:

One way of dealing with this is in R is the function make.names with the option unique=TRUE, see ?make.names.

> nams = c("bl-a","bl-a","bl-a", "foo" )
> df = data.frame(matrix (1:4))
> df
  matrix.1.4.
1           1
2           2
3           3
4           4
> rownames(df) = nams
Error in `row.names<-.data.frame`(`*tmp*`, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘bl-a’ 
> rownames(df) = make.names(nams, unique=TRUE)
> df
       matrix.1.4.
bl.a             1
bl.a.1           2
bl.a.2           3
foo              4
ADD COMMENTlink written 4.6 years ago by Michael Dondrup43k

what could the possible solution if I need to add the value and row name should remain the same. e.g instead

bl.a 1 bl.a.1 2 bl.a.2 3 f>oo 4 I need bl.a 5 (adding the first 3 rows with same name) foo 4

ADD REPLYlink modified 12 months ago • written 12 months ago by Shahzad10
1

I am not sure if I understand correctly, but it is best to have unique row names. If you need the original info, you could make a data frame with an additional column containing the unmodified gene names.

It is possible to turn check.names off when constructing the data, but I wouldn't recommend it.

ADD REPLYlink written 12 months ago by Michael Dondrup43k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 925 users visited in the last hour