Question: How to overcome "duplicate row.names" error while importing tpm/count/fpkm matrix/table in R ?
0
Jon • 160 wrote:
HI,
I used RSEM to produce TPM matrix for 100 RNA seq reads (row names= gene, columns are cell number) I get the following error, can anyone help me please!
> all <-read.table(file="tpm_matrix.xls",header=T)
Error in read.table(file = "tpm_matrix.xls", header = T) :
duplicate 'row.names' are not allowed
I changed the format of row names from "gene10000_Ermap" to "Ermap".
ADD COMMENT
• link
•
modified 21 months ago
by
andrew.j.skelton73 ♦ 5.9k
•
written
21 months ago by
Jon • 160
If your file is rather a text file but no MS xls, you can use
cut -f 1 tpm_matrix.xls | sort | uniq -c | sort -k1,1nr
to find the duplicated row names. If there are only a few, you'll curate that manually; if you have a lot, you can use e.g. awk to add a consecutive number to the name.This helped, Thank you so much