Question: UCSC mouse gene annotation mm9 shows speculative genes
gravatar for tonja.r
4.8 years ago by
tonja.r470 wrote:

I extracted exon information from TxDb.Mmusculus.UCSC.mm9.knownGene and annotated them according to the gene SYMBOL and took a look on what is happening in the package.

mm9 = TxDb.Mmusculus.UCSC.mm9.knownGene
exon = exons(mm9)
exon_ranges = ranges(exon)
gene_id_exons = select(mm9, keys=as.character(exon$exon_id), columns = c("GENEID","TXNAME"), keytype = "EXONID")
colnames(gene_id_exons) = c("EXONID","ENTREZID","TXNAME")
symbol <- select(, keys=as.character(unique(gene_id_exons$ENTREZID)), keytype="ENTREZID",columns="SYMBOL")
gene_id_exons = merge(gene_id_exons,symbol,all.x=T)
exon_info =  data.frame(START = start(exon_ranges), END = end(exon_ranges), CHR = seqnames(exon), STRAND = strand(exon),EXONID = exon$exon_id)
exon_info = merge(exon_info,gene_id_exons,all.x=T)

​It shows me 7 exons corresponding to three different transcript of one gene 497097:​

> subset(exon_info, ENTREZID == 497097)
14642   7584 3195985 3197398 chr1      -   497097 uc007aet.1   Xkr4
14643   7585 3203520 3205713 chr1      -   497097 uc007aet.1   Xkr4
14644   7586 3204563 3207049 chr1      -   497097 uc007aeu.1   Xkr4
14645   7587 3411783 3411982 chr1      -   497097 uc007aeu.1   Xkr4
14646   7588 3638392 3640590 chr1      -   497097 uc007aev.1   Xkr4
14647   7589 3648928 3648985 chr1      -   497097 uc007aev.1   Xkr4
14648   7590 3660633 3661579 chr1      -   497097 uc007aeu.1   Xkr4

The are 3 transcripts of gene Xkr4: uc007aet.1, uc007aeu.1 and uc007aev.1.

Genome Browser gives me following information: Mouse Gene mKIAA1889 (uc007aet.1), Mouse Gene Xkr4 (uc007aeu.1), Mouse Gene AK149000 (uc007aev.1). However, RefSeq says all 3 are Xkr4 and Xkr4 has in fact only one transcript with 3 exons. 

How can I get rid of those speculative genes and exons? Because it seems it affects by downstream analysis when I use mapped (to mm9) reads. Or is there a RefSeq annotation for exons, genes and promoters?

R • 1.4k views
ADD COMMENTlink modified 4.8 years ago by Devon Ryan95k • written 4.8 years ago by tonja.r470
  1. What version of R are you using. The Txdb packages get updated on occasion as do the UCSC knownGene and kgAlias tracks that are used to make it. If you're using an older version then perhaps this has already been remedied.
  2. If you're using the most recent version, have you tried contacting the package maintainer?
ADD REPLYlink written 4.8 years ago by Devon Ryan95k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1163 users visited in the last hour