Get Gene symbols
1
0
Entering edit mode
4.8 years ago
Kai_Qi ▴ 130

Hi:

I am a new learner for RNA-seq analysis. I am trying to use ASpli for analyze my RNA-seq data. I use the TxDb.Mmusculus.UCSC.mm10.knownGene from bioconductor to bin the genome. In the end I got a file with gene coordinates like chr1.start.end. But I also need to get the information for gene symbols. Does anyone know how can i get there?

Thanks, Cai

RNA-Seq • 2.2k views
ADD COMMENT
0
Entering edit mode
4.8 years ago

BEDtools intersect with your binned BED files and the gene list (presuming it's also in BED or GTF format) will work.

biomaRt is also a go-to for these sorts of things, but probably not necessary in this case.

ADD COMMENT
0
Entering edit mode

Hi Jared: Thanks for your reply. I use features <- binGenome(TxDb.Mmusculus.UCSC.mm10.knownGene) to bin the genome. Do you mean that I write the features out and use intersect with do it?

The final outout is a tab file.

Thanks,

Cai

ADD REPLY
0
Entering edit mode

Based on ASpli's docs, that should have gene information retained. Are the coordinates the only thing in features after running that?

ADD REPLY
1
Entering edit mode

After running it myself, it looks like you'll have to pull the symbols from the TxDb object yourself, then assign them to each gene when you run binGenome. From section 3.1.2 of the ASpli docs:

# extract features and map symbols to genes
symbols <- data.frame(row.names = genes( aTxDb), symbol = paste( 'This is symbol of gene:', genes(aTxDb)))
features <- binGenome( aTxDb, geneSymbols = symbols )

This will throw an error with your TxDb object since you can't have duplicate row names in a dataframe, but this should be a starting point for you.

ADD REPLY
0
Entering edit mode

Thanks Jared! I tried it and I got an error. So, I can't proceed with the error. Do you have some suggestions to add this and go down without errors?

Cai

ADD REPLY
0
Entering edit mode
library(GenomicFeatures)
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
TxDb <- TxDb.Mmusculus.UCSC.mm10.knownGene
library(ASpli)
features <- binGenome(TxDb)
symbols <- data.frame(row.names = genes(TxDb),
                      symbol = paste(genes(TxDb)))*

the occurring error is:

symbols <- data.frame(row.names = genes(TxDb),
+                       symbol = paste(genes(TxDb)))
Error in data.frame(row.names = genes(TxDb), symbol = paste(genes(TxDb))) : 
  duplicate row.names: chr4:61224307-61674094:-, chr17:34145416-34160229:+, chr9:44101770-44109187:+, chr1:133309823-133329722:+, chr1:88103257-88220002:+
ADD REPLY
0
Entering edit mode

Yes, as I said, this will cause that error due to the duplicate row names. You need to figure out a way to make the row names unique in order to use the code as is. I would contact the authors on the bioconductor forums if you want a clean answer.

ADD REPLY
0
Entering edit mode

That is great if the author can help to make it through. I will shared my notes/process on each step in the middle and share it as well if I got the results in the end.

ADD REPLY
0
Entering edit mode

I also tried this :symbols <- data.frame(genes = genes(TxDb), symbol = paste(genes(TxDb)))

And in the next step I still met an error:

Error in features < binGenome(TxDb, geneSymbols = symbols) : 
  comparison (3) is possible only for atomic and list types
In addition: Warning messages:
1: In normSplitFactor(f, x) :
  'NROW(x)' is not a multiple of split factor length
2: In normSplitFactor(f, x) :
  'NROW(x)' is not a multiple of split factor length
ADD REPLY
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY
0
Entering edit mode

OK. I just started to use biostar, and I will follow your advice. Have a nice day!

ADD REPLY

Login before adding your answer.

Traffic: 3161 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6