How to overcome "duplicate row.names" error while importing tpm/count/fpkm matrix/table in R ?
1
0
Entering edit mode
6.1 years ago
John ▴ 270

HI,

I used RSEM to produce TPM matrix for 100 RNA seq reads (row names= gene, columns are cell number) I get the following error, can anyone help me please!

> all <-read.table(file="tpm_matrix.xls",header=T)
Error in read.table(file = "tpm_matrix.xls", header = T) : 
  duplicate 'row.names' are not allowed

I changed the format of row names from "gene10000_Ermap" to "Ermap".

R software error rna-seq RNA-Seq • 11k views
ADD COMMENT
1
Entering edit mode

If your file is rather a text file but no MS xls, you can use cut -f 1 tpm_matrix.xls | sort | uniq -c | sort -k1,1nr to find the duplicated row names. If there are only a few, you'll curate that manually; if you have a lot, you can use e.g. awk to add a consecutive number to the name.

ADD REPLY
0
Entering edit mode

This helped, Thank you so much

ADD REPLY
1
Entering edit mode
6.1 years ago
library(gdata)
data.in <- read.xls("tpm_matrix.xls")

read.table is trying to create a matrix, and as you've observed duplicate row names aren't allowed. You can get around this by importing it as a data frame.

Disclaimer: Untested, and probably a Tidyverse way to do it now instead of openxlsx...

ADD COMMENT
0
Entering edit mode

Hi, Thanks but I get the following error!

data.in <- read.xlsx("tpm_matrix.xlsx")
Error in file(con, "r") : invalid 'description' argument
In addition: Warning message:
In unzip(xlsxFile, exdir = xmlDir) : error 1 in extracting from zip file
ADD REPLY
1
Entering edit mode

I've edited my OP - assumed it was an xlsx file, not xls... my bad.

ADD REPLY
0
Entering edit mode

tidyverse way is to use readxl. See here

ADD REPLY

Login before adding your answer.

Traffic: 1855 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6