bioconductor promoter annotation for mm9
7.2 years ago
tonja.r ▴ 600

I want to extract all genomic location of promoters for mm9 with the corresponding transcript id/gene id/symbol. However, I have found out that there are duplicates ranges and sometimes two promoters correspond to one gene.

mm9 = TxDb.Mmusculus.UCSC.mm9.knownGene
promoter<-promoters(mm9)
GRanges object with 6 ranges and 2 metadata columns:
seqnames             ranges strand |     tx_id     tx_name
<Rle>          <IRanges>  <Rle> | <integer> <character>
[1]     chr1 [4795974, 4798173]      + |         1  uc007afg.1
[2]     chr1 [4845775, 4847974]      + |         3  uc007afi.2
[3]     chr1 [4846409, 4848608]      + |         5  uc011whu.1


So, here are the duplicates ranges. What is the reason for having those duplicates ranges/promoters?

Then I need for each promoter a corresponding gene id/symbol

promoter = unique(promoter)​
gene_id_promoter = select(mm9, keys=as.character(promoter\$tx_id), columns = c("TXNAME","GENEID"), keytype = "TXID")
TXID GENEID     TXNAME
1    1  18777 uc007afg.1
2    3  21399 uc007afi.2
3    5  21399 uc011whu.1
4    6 108664 uc007afm.1
5    8  18387 uc007afo.1
6   10  18387 uc007afq.1


Different transcript of a gene have the same gene id. But how is it possible that one gene can have two promoters? It means basically that two promoters (uc007afi.2, uc011whu.1) correspond to one gene id (21399) and two different transcripts of the same gene. So, I took a look on my ranges again.

  [2]     chr1 [4845775, 4847974]      + |         3  uc007afi.2
[3]     chr1 [4846409, 4848608]      + |         5  uc011whu.1


uc007afi.2 is in the range of uc011whu.1. How can it be explained? I have two promoters corresponding to one gene and two transcripts but one is in the range of another one. The reason for that is the not exact definition of a promoter region, isn't it? What region should I take to define a promoter region for a gene 21399?

Actually, many genes do have alternative promoters. Your gene also seems to have one.