Want to download the proximal promoter region of my favorite gene from every species in Ensembl
2
0
Entering edit mode
3.6 years ago
friist ▴ 20

Are there any ready-to-use scripts available that would allow me to download a ~2000 bp region immediately upstream from the 5-prime region of any given gene using Biomart. I would like to automate the process such that I can target the proximal promoter of my favorite gene in all the species in Ensemble. I suppose the place to begin is to install Biomart on my puter?

Any words of advice?

Cheers

TEF

ensembl Biomart sequence automation • 1.3k views
ADD COMMENT
3
Entering edit mode
3.6 years ago
Emily 23k

I would probably use the REST API rather than BioMart. There's an online course Jupyter notebooks to get you started with REST in Python, Perl or R.

Starting with your favourite gene, use the homology endpoint to get all the orthologues. You can then pull out the Ensembl ID for each and use the lookup endpoint to pull out the coordinates, which you can then use to do some arithmetic to get the upstream region coordinates, which you can put into the sequence region endpoint. Alternatively, you could just use your Ensembl ID with the sequence ID endpoint and get genomic sequence with expand_5prime, but that would also get you the genomic region of the gene.

ADD COMMENT
1
0
Entering edit mode

I would like to automate the process such that I can target the proximal promoter of my favorite gene in all the species in Ensemble

Part of the way to what OP is asking for.

TEF : BioMart works on a species at a time (AFAIK). You will need to the coordinates up-front to loop through more than one species.

ADD REPLY

Login before adding your answer.

Traffic: 1700 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6