Mouse promoter regions
1
0
Entering edit mode
4 months ago
el24 ▴ 20

Hi all,

I have a quick question, but I'm new to this so I'm not sure how to solve it. I need to find the promoter regions for mouse data, but I can't find a reliable file to download yet. It would be great if someone can help me with this, please. This is a website I found, but I am not sure what to download. What I want is a promoter file like this:

chr   start    end   gene    strand
.
.


Thanks!

promoter mouse chr • 340 views
2
Entering edit mode
4 months ago

Here's an R solution.

library("AnnotationHub")
library("ensembldb")

release <- 101
anno <- query(AnnotationHub(), pattern=c("Mus musculus", "EnsDb", release))[[1]]

proms <- promoters(genes(anno), upstream=1000, downstream=100)
proms <- trim(proms)[, "gene_id"]

> proms
GRanges object with 56305 ranges and 1 metadata column:
seqnames            ranges strand |            gene_id
<Rle>         <IRanges>  <Rle> |        <character>
ENSMUSG00000102693        1   3072253-3073352      + | ENSMUSG00000102693
ENSMUSG00000064842        1   3101016-3102115      + | ENSMUSG00000064842
ENSMUSG00000051951        1   3671399-3672498      - | ENSMUSG00000051951
ENSMUSG00000102851        1   3251757-3252856      + | ENSMUSG00000102851
ENSMUSG00000103377        1   3368450-3369549      - | ENSMUSG00000103377
...      ...               ...    ... .                ...
ENSMUSG00000095366        Y 90755368-90756467      - | ENSMUSG00000095366
ENSMUSG00000095134        Y 90752057-90753156      + | ENSMUSG00000095134
ENSMUSG00000096768        Y 90783738-90784837      + | ENSMUSG00000096768
ENSMUSG00000099871        Y 90836413-90837512      + | ENSMUSG00000099871
ENSMUSG00000096850        Y 90839078-90840177      - | ENSMUSG00000096850
-------
seqinfo: 118 sequences from GRCm38 genome


If you want to export it as a bed file you can do rtracklayer::export(proms, "mouse_promoters.bed", "bed").

0
Entering edit mode

Thank you very much, this is very helpful! Quick question, when I try release <- 101, I get the following error:

  Error in .Hub_get1(x[i], force = force, verbose = verbose) : no records found for the given index


Therefore, I tried release <- 100, and I could get the promoters successfully. Those releases shouldn't be much different right?

Another question, are parameters upstream=1000, downstream=100 your recommended values? Could you please tell me what each of them means? I guess here it means returning genes within 1000bp upstream and 100bp downstream, but I'm not sure. I appreciate it if you can tell me.

Thanks!

1
Entering edit mode

Therefore, I tried release <- 100, and I could get the promoters successfully. Those releases shouldn't be much different right?

It might be safer to check what the query is returning if you run into that problem, since it might be matching more than one release. Run query(AnnotationHub(), pattern=c("Mus musculus", "EnsDb", release)) and see exactly how many hits are being returned. If it's more than one use the ID of the correct hit (Usually looks like AH83247) to pull it directly.

ah <- AnnotationHub()
anno <- ah[["AH83247"]]


Another question, are parameters upstream=1000, downstream=100 your recommended values? Could you please tell me what each of them means? I guess here it means returning genes within 1000bp upstream and 100bp downstream, but I'm not sure. I appreciate it if you can tell me.

The promoter regions are being defined relative to the TSS of the gene (or transcript). Those parameters mean the range 1000 bases upstream of the TSS to 100 bases downstream of the TSS. That's a fairly conservative range for mice. Your final range can depend somewhat on what you are actually looking for in those regions, but you could expand it to something like 2500 bases upstream to 250 downstream for example.

0
Entering edit mode

Thank you for the clear explanation!

Here is what I get when running the code:

release <- 101
query(AnnotationHub(), pattern=c("Mus musculus", "EnsDb", release))

snapshotDate(): 2019-10-29
AnnotationHub with 0 records
# snapshotDate(): 2019-10-29


Then, when I try:

ah <- AnnotationHub()
anno <- ah[["AH83247"]]


I get this error:

Error: Public


I appreciate it if you can let me know if you know how I can solve this. Thanks!

1
Entering edit mode

Try updating the AnnotationHub library and seeing if it helps.

0
Entering edit mode

I have tried it, but it didn't help. Thanks for helping, anyway!