Question

Get Gene symbols

0

Entering edit mode

4.8 years ago

Kai_Qi ▴ 130

Hi:

I am a new learner for RNA-seq analysis. I am trying to use ASpli for analyze my RNA-seq data. I use the TxDb.Mmusculus.UCSC.mm10.knownGene from bioconductor to bin the genome. In the end I got a file with gene coordinates like chr1.start.end. But I also need to get the information for gene symbols. Does anyone know how can i get there?

Thanks, Cai

RNA-Seq • 2.2k views

ADD COMMENT • link updated 4.8 years ago by jared.andrews07 ★ 16k • written 4.8 years ago by Kai_Qi ▴ 130

GenoMax · Answer 1 · 2019-07-16

0

Entering edit mode

4.8 years ago

jared.andrews07 ★ 16k

BEDtools intersect with your binned BED files and the gene list (presuming it's also in BED or GTF format) will work.

biomaRt is also a go-to for these sorts of things, but probably not necessary in this case.

ADD COMMENT • link 4.8 years ago by jared.andrews07 ★ 16k

0

Entering edit mode

Hi Jared: Thanks for your reply. I use features <- binGenome(TxDb.Mmusculus.UCSC.mm10.knownGene) to bin the genome. Do you mean that I write the features out and use intersect with do it?

The final outout is a tab file.

Thanks,

Cai

ADD REPLY • link 4.8 years ago by Kai_Qi ▴ 130

0

Entering edit mode

Based on ASpli's docs, that should have gene information retained. Are the coordinates the only thing in features after running that?

ADD REPLY • link 4.8 years ago by jared.andrews07 ★ 16k

1

Entering edit mode

After running it myself, it looks like you'll have to pull the symbols from the TxDb object yourself, then assign them to each gene when you run binGenome. From section 3.1.2 of the ASpli docs:

# extract features and map symbols to genes
symbols <- data.frame(row.names = genes( aTxDb), symbol = paste( 'This is symbol of gene:', genes(aTxDb)))
features <- binGenome( aTxDb, geneSymbols = symbols )

This will throw an error with your TxDb object since you can't have duplicate row names in a dataframe, but this should be a starting point for you.

ADD REPLY • link 4.8 years ago by jared.andrews07 ★ 16k

0

Entering edit mode

Thanks Jared! I tried it and I got an error. So, I can't proceed with the error. Do you have some suggestions to add this and go down without errors?

Cai

ADD REPLY • link 4.8 years ago by Kai_Qi ▴ 130

0

Entering edit mode

library(GenomicFeatures)
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
TxDb <- TxDb.Mmusculus.UCSC.mm10.knownGene
library(ASpli)
features <- binGenome(TxDb)
symbols <- data.frame(row.names = genes(TxDb),
                      symbol = paste(genes(TxDb)))*

the occurring error is:

symbols <- data.frame(row.names = genes(TxDb),
+                       symbol = paste(genes(TxDb)))
Error in data.frame(row.names = genes(TxDb), symbol = paste(genes(TxDb))) : 
  duplicate row.names: chr4:61224307-61674094:-, chr17:34145416-34160229:+, chr9:44101770-44109187:+, chr1:133309823-133329722:+, chr1:88103257-88220002:+

ADD REPLY • link updated 4.8 years ago by GenoMax 141k • written 4.8 years ago by Kai_Qi ▴ 130

0

Entering edit mode

Yes, as I said, this will cause that error due to the duplicate row names. You need to figure out a way to make the row names unique in order to use the code as is. I would contact the authors on the bioconductor forums if you want a clean answer.

ADD REPLY • link 4.8 years ago by jared.andrews07 ★ 16k

0

Entering edit mode

That is great if the author can help to make it through. I will shared my notes/process on each step in the middle and share it as well if I got the results in the end.

ADD REPLY • link 4.8 years ago by Kai_Qi ▴ 130

0

Entering edit mode

I also tried this :symbols <- data.frame(genes = genes(TxDb), symbol = paste(genes(TxDb)))

And in the next step I still met an error:

Error in features < binGenome(TxDb, geneSymbols = symbols) : 
  comparison (3) is possible only for atomic and list types
In addition: Warning messages:
1: In normSplitFactor(f, x) :
  'NROW(x)' is not a multiple of split factor length
2: In normSplitFactor(f, x) :
  'NROW(x)' is not a multiple of split factor length

ADD REPLY • link updated 4.8 years ago by GenoMax 141k • written 4.8 years ago by Kai_Qi ▴ 130

0

Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY • link 4.8 years ago by GenoMax 141k

0

Entering edit mode

OK. I just started to use biostar, and I will follow your advice. Have a nice day!

ADD REPLY • link 4.8 years ago by Kai_Qi ▴ 130