I extracted exon information from TxDb.Mmusculus.UCSC.mm9.knownGene
and annotated them according to the gene SYMBOL and took a look on what is happening in the package.
mm9 = TxDb.Mmusculus.UCSC.mm9.knownGene
exon = exons(mm9)
exon_ranges = ranges(exon)
gene_id_exons = select(mm9, keys=as.character(exon$exon_id), columns = c("GENEID","TXNAME"), keytype = "EXONID")
colnames(gene_id_exons) = c("EXONID","ENTREZID","TXNAME")
symbol <- select(org.Mm.eg.db, keys=as.character(unique(gene_id_exons$ENTREZID)), keytype="ENTREZID",columns="SYMBOL")
gene_id_exons = merge(gene_id_exons,symbol,all.x=T)
exon_info = data.frame(START = start(exon_ranges), END = end(exon_ranges), CHR = seqnames(exon), STRAND = strand(exon),EXONID = exon$exon_id)
exon_info = merge(exon_info,gene_id_exons,all.x=T)
It shows me 7 exons corresponding to three different transcript of one gene 497097:
> subset(exon_info, ENTREZID == 497097)
EXONID START END CHR STRAND ENTREZID TXNAME SYMBOL
14642 7584 3195985 3197398 chr1 - 497097 uc007aet.1 Xkr4
14643 7585 3203520 3205713 chr1 - 497097 uc007aet.1 Xkr4
14644 7586 3204563 3207049 chr1 - 497097 uc007aeu.1 Xkr4
14645 7587 3411783 3411982 chr1 - 497097 uc007aeu.1 Xkr4
14646 7588 3638392 3640590 chr1 - 497097 uc007aev.1 Xkr4
14647 7589 3648928 3648985 chr1 - 497097 uc007aev.1 Xkr4
14648 7590 3660633 3661579 chr1 - 497097 uc007aeu.1 Xkr4
The are 3 transcripts of gene Xkr4: uc007aet.1, uc007aeu.1 and uc007aev.1.
Genome Browser gives me following information: Mouse Gene mKIAA1889 (uc007aet.1), Mouse Gene Xkr4 (uc007aeu.1), Mouse Gene AK149000 (uc007aev.1). However, RefSeq says all 3 are Xkr4 and Xkr4 has in fact only one transcript with 3 exons.
How can I get rid of those speculative genes and exons? Because it seems it affects by downstream analysis when I use mapped (to mm9) reads. Or is there a RefSeq annotation for exons, genes and promoters?
- What version of R are you using. The Txdb packages get updated on occasion as do the UCSC knownGene and kgAlias tracks that are used to make it. If you're using an older version then perhaps this has already been remedied.
- If you're using the most recent version, have you tried contacting the package maintainer?