error in duplicate row.names in getting gene symbols
1
0
Entering edit mode
4.8 years ago
Kai_Qi ▴ 130

Hi:

I am running RNA-seq analysis. I met an error in the step of extraction features and map symbols to genes. My running code is:

library(GenomicFeatures)
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
TxDb <- TxDb.Mmusculus.UCSC.mm10.knownGene
library(ASpli)
features <- binGenome(TxDb)
symbols <- data.frame(row.names = genes(TxDb),
                      symbol = paste(genes(TxDb)))*

the occurring error is:

symbols <- data.frame(row.names = genes(TxDb),
+                       symbol = paste(genes(TxDb)))
Error in data.frame(row.names = genes(TxDb), symbol = paste(genes(TxDb))) : 
  duplicate row.names: chr4:61224307-61674094:-, chr17:34145416-34160229:+, chr9:44101770-44109187:+, chr1:133309823-133329722:+, chr1:88103257-88220002:+

What should I correct?

Cai

RNA-Seq • 1.6k views
ADD COMMENT
0
Entering edit mode

Hi Guys:

I am sorry to have so many errors. When I put the command : "features <- binGenome (TxDb, geneSymbols = symbols)", I got an error like this:

Error in features < binGenome(TxDb, geneSymbols = symbols) : 
  comparison (3) is possible only for atomic and list types
In addition: Warning messages:
1: In normSplitFactor(f, x) :
  'NROW(x)' is not a multiple of split factor length
2: In normSplitFactor(f, x) :
  'NROW(x)' is not a multiple of split factor length

What does that mean?

Thank you,

Cai

ADD REPLY
2
Entering edit mode
4.8 years ago
shawn.w.foley ★ 1.3k

A data frame in R requires unique row.names, a simple work around would be to make a data frame with an additional column and no row names:

symbols <- data.frame(genes = genes(TxDb), symbol = paste(genes(TxDb)))
ADD COMMENT
1
Entering edit mode

Thanks Shawn, I put the command in the script it did not pop out error. Thanks a lot for your help

ADD REPLY
0
Entering edit mode

^ This is the way to go. Plus, remember that row names are not an ideal feature of data.frames, you should prefer columns whenever possible. row.names makes sense in matrices, not in data frames.

ADD REPLY
0
Entering edit mode

While this is technically a solution, it is still an issue that there are duplicated gene names /ranges. Can you try and update the package to the newest version? On my machine, downloaded the package just yet, the code runs without errors. Also the ranges for one of these genes is different, e.g. your chr4:61224307-61674094:- is in my results chr4:61224307-61228286:-. Maybe you have a very old version?

ADD REPLY

Login before adding your answer.

Traffic: 1540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6