Retrieving sequences from Ensembl Archive
1
0
Entering edit mode
3.3 years ago
biostR • 0

Hi,

I am looking for a way to retrieve DNA sequences from Ensembl May 2017 archive, based on coordinates. I thought using Biomart package would be useful for getting DNA sequences, however, it did not work. Apparently, sequence type (seqType, type) is required for obtaining a sequence using getSequence function.

For example:

  ensembl<-useMart(host="may2017.archive.ensembl.org",
biomart="ENSEMBL_MART_ENSEMBL",
dataset="hsapiens_gene_ensembl")
seq<-biomaRt::getSequence(chromosome="X", start =  100639991, end = 100644991 , mart=ensembl )


This gives the following error:

Error in biomaRt::getSequence(chromosome = "X", start = 100639991, end = 100644991,  :
Please specify the type of sequence that needs to be retrieved when using biomaRt in web service mode.  Choose either gene_exon, transcript_exon,transcript_exon_intron, gene_exon_intron, cdna, coding,coding_transcript_flank,coding_gene_flank,transcript_flank,gene_flank,peptide, 3utr or 5utr


Is there a nice way for getting the DNA sequences of a large list of genomic coordinates?

Thank you very much.

DNA sequence Ensembl Ensembl archive • 1.3k views
0
Entering edit mode

Did you check the documentation? Sequence type genomic is one of the allowed options.

0
Entering edit mode

biomaRt v2.32.1 is installed which does not allow "genomic" as the seqType. If I try I get the following:

Error in biomaRt::getSequence(chromosome = "X", start = 100639991, end = 100644991,  :
Please specify the type of sequence that needs to be retrieved when using biomaRt in web service mode.
Choose either gene_exon, transcript_exon,transcript_exon_intron,
gene_exon_intron, cdna, coding,coding_transcript_flank,coding_gene_flank,transcript_flank,
gene_flank,peptide, 3utr or 5utr

0
Entering edit mode

Tagging: Mike Smith to see if he can help.

0
Entering edit mode

thanks Emily this was useful,
how I can retrieve archive sequences from older rat assemblie

the code below work, however it retrieve sequences from rno6 (latest rat genome) what i need is rno4 , which is located in the ensemble archive here http://may2012.archive.ensembl.org if I change the server address it gives me errore ! any suggestion ?

import requests, sys

server = "http://may2017.rest.ensembl.org"
ext = "/sequence/region/rat/1:34592855..34676565:1?"

r = requests.get(server+ext, headers={ "Content-Type" : "application/json"})

if not r.ok:
r.raise_for_status()
sys.exit()

decoded = r.json()
print(repr(decoded))

0
Entering edit mode

Unfortunately we don't have REST archives that old. I also checked the our remapping tools and we don't have mapping between RGSC3.4 and Rnor_6.0.

5
Entering edit mode
3.3 years ago

BioMart is gene-centric, it cannot get sequences of genomic regions. The easiest way to get what you need is using the REST API archive with the POST sequence/region endpoint. This will allow you to retrieve multiple sequences, and you can code around it in any language.