Question: How to overcome "duplicate row.names" error while importing tpm/count/fpkm matrix/table in R ?
0
gravatar for sciencer
9 months ago by
sciencer70
sciencer70 wrote:

HI,

I used RSEM to produce TPM matrix for 100 RNA seq reads (row names= gene, columns are cell number) I get the following error, can anyone help me please!

> all <-read.table(file="tpm_matrix.xls",header=T)
Error in read.table(file = "tpm_matrix.xls", header = T) : 
  duplicate 'row.names' are not allowed

I changed the format of row names from "gene10000_Ermap" to "Ermap".

rna-seq R software error • 1.6k views
ADD COMMENTlink modified 9 months ago by andrew.j.skelton735.4k • written 9 months ago by sciencer70
1

If your file is rather a text file but no MS xls, you can use cut -f 1 tpm_matrix.xls | sort | uniq -c | sort -k1,1nr to find the duplicated row names. If there are only a few, you'll curate that manually; if you have a lot, you can use e.g. awk to add a consecutive number to the name.

ADD REPLYlink written 9 months ago by michael.ante2.9k

This helped, Thank you so much

ADD REPLYlink written 9 months ago by sciencer70
1
gravatar for andrew.j.skelton73
9 months ago by
London
andrew.j.skelton735.4k wrote:
library(gdata)
data.in <- read.xls("tpm_matrix.xls")

read.table is trying to create a matrix, and as you've observed duplicate row names aren't allowed. You can get around this by importing it as a data frame.

Disclaimer: Untested, and probably a Tidyverse way to do it now instead of openxlsx...

ADD COMMENTlink modified 9 months ago • written 9 months ago by andrew.j.skelton735.4k

Hi, Thanks but I get the following error!

data.in <- read.xlsx("tpm_matrix.xlsx")
Error in file(con, "r") : invalid 'description' argument
In addition: Warning message:
In unzip(xlsxFile, exdir = xmlDir) : error 1 in extracting from zip file
ADD REPLYlink written 9 months ago by sciencer70
1

I've edited my OP - assumed it was an xlsx file, not xls... my bad.

ADD REPLYlink written 9 months ago by andrew.j.skelton735.4k

tidyverse way is to use readxl. See here

ADD REPLYlink written 9 months ago by russhh4.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 696 users visited in the last hour