Entering edit mode
4.8 years ago
Kai_Qi
▴
130
Hi:
I am a new learner for RNA-seq analysis. I am trying to use ASpli for analyze my RNA-seq data. I use the TxDb.Mmusculus.UCSC.mm10.knownGene from bioconductor to bin the genome. In the end I got a file with gene coordinates like chr1.start.end. But I also need to get the information for gene symbols. Does anyone know how can i get there?
Thanks, Cai
Hi Jared: Thanks for your reply. I use features <- binGenome(TxDb.Mmusculus.UCSC.mm10.knownGene) to bin the genome. Do you mean that I write the features out and use intersect with do it?
The final outout is a tab file.
Thanks,
Cai
Based on ASpli's docs, that should have gene information retained. Are the coordinates the only thing in
features
after running that?After running it myself, it looks like you'll have to pull the symbols from the TxDb object yourself, then assign them to each gene when you run
binGenome
. From section 3.1.2 of theASpli
docs:This will throw an error with your TxDb object since you can't have duplicate row names in a dataframe, but this should be a starting point for you.
Thanks Jared! I tried it and I got an error. So, I can't proceed with the error. Do you have some suggestions to add this and go down without errors?
Cai
Yes, as I said, this will cause that error due to the duplicate row names. You need to figure out a way to make the row names unique in order to use the code as is. I would contact the authors on the bioconductor forums if you want a clean answer.
That is great if the author can help to make it through. I will shared my notes/process on each step in the middle and share it as well if I got the results in the end.
I also tried this :
symbols <- data.frame(genes = genes(TxDb), symbol = paste(genes(TxDb)))
And in the next step I still met an error:
Please use the formatting bar (especially the
code
option) to present your post better. I've done it for you this time.Thank you!
OK. I just started to use biostar, and I will follow your advice. Have a nice day!