How can I retrieve genomic sequences from ENSEMBL in a window both up stream and downstream of the TSS?
2
0
Entering edit mode
2.9 years ago

Is there a way to use bioMART on the ensemble.org website to retrieve genomic sequences in a window (ie. 500 bp) both upstream and downstream of the annotated TSS for selected genes? So far, I can only retrieve genomic sequences either upstream or downstream, but not both.

sequence Genome ensembl motif • 1.0k views
ADD COMMENT
0
Entering edit mode
2.9 years ago
boczniak767 ▴ 850

Do you really need biomart? If not you can download gff with gene information, extract only genes to bed with TSS coordinate for both start and end (2nd and 3rd columns) with awk and use samtools slop to retrieve requested range. Retrieval of TSS to bed should be done for each strand separately.

As you see it requires some coding. Maybe somebody else can provide simpler solution.

ADD COMMENT
0
Entering edit mode
2.9 years ago
Emily 23k

You could use the Ensembl REST API. Use one of the Lookup endpoints to get the TSS of the gene (check the strand, if the gene is +, TSS is the start, if it's -, TSS is the end). Then use that with the sequence region endpoint to get that sequence with the expand options. You can copy most of the code from exercise 5.2 of this online course but just switch to the sequence endpoint.

ADD COMMENT

Login before adding your answer.

Traffic: 2330 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6