Question: Retrieving sequences from Ensembl Archive
0
gravatar for biostR
14 months ago by
biostR0
biostR0 wrote:

Hi,

I am looking for a way to retrieve DNA sequences from Ensembl May 2017 archive, based on coordinates. I thought using Biomart package would be useful for getting DNA sequences, however, it did not work. Apparently, sequence type (seqType, type) is required for obtaining a sequence using getSequence function.

For example:

  ensembl<-useMart(host="may2017.archive.ensembl.org",
                     biomart="ENSEMBL_MART_ENSEMBL",
                     dataset="hsapiens_gene_ensembl")    
    seq<-biomaRt::getSequence(chromosome="X", start =  100639991, end = 100644991 , mart=ensembl )

This gives the following error:

Error in biomaRt::getSequence(chromosome = "X", start = 100639991, end = 100644991,  : 
  Please specify the type of sequence that needs to be retrieved when using biomaRt in web service mode.  Choose either gene_exon, transcript_exon,transcript_exon_intron, gene_exon_intron, cdna, coding,coding_transcript_flank,coding_gene_flank,transcript_flank,gene_flank,peptide, 3utr or 5utr

Is there a nice way for getting the DNA sequences of a large list of genomic coordinates?

Thank you very much.

ADD COMMENTlink modified 11 days ago by becton10 • written 14 months ago by biostR0

Did you check the documentation? Sequence type genomic is one of the allowed options.

ADD REPLYlink modified 14 months ago • written 14 months ago by genomax71k

biomaRt v2.32.1 is installed which does not allow "genomic" as the seqType. If I try I get the following:

Error in biomaRt::getSequence(chromosome = "X", start = 100639991, end = 100644991,  : 
Please specify the type of sequence that needs to be retrieved when using biomaRt in web service mode. 
Choose either gene_exon, transcript_exon,transcript_exon_intron,
gene_exon_intron, cdna, coding,coding_transcript_flank,coding_gene_flank,transcript_flank,
gene_flank,peptide, 3utr or 5utr
ADD REPLYlink modified 14 months ago • written 14 months ago by biostR0

Tagging: Mike Smith to see if he can help.

ADD REPLYlink written 14 months ago by genomax71k

thanks Emily this was useful,
how I can retrieve archive sequences from older rat assemblie

the code below work, however it retrieve sequences from rno6 (latest rat genome) what i need is rno4 , which is located in the ensemble archive here http://may2012.archive.ensembl.org if I change the server address it gives me errore ! any suggestion ?

import requests, sys

server = "http://may2017.rest.ensembl.org"
ext = "/sequence/region/rat/1:34592855..34676565:1?"

r = requests.get(server+ext, headers={ "Content-Type" : "application/json"})

if not r.ok:
  r.raise_for_status()
  sys.exit()

decoded = r.json()
print(repr(decoded))
ADD REPLYlink written 11 days ago by becton10

Unfortunately we don't have REST archives that old. I also checked the our remapping tools and we don't have mapping between RGSC3.4 and Rnor_6.0.

ADD REPLYlink modified 10 days ago • written 10 days ago by Emily_Ensembl19k
4
gravatar for Emily_Ensembl
14 months ago by
Emily_Ensembl19k
EMBL-EBI
Emily_Ensembl19k wrote:

BioMart is gene-centric, it cannot get sequences of genomic regions. The easiest way to get what you need is using the REST API archive with the POST sequence/region endpoint. This will allow you to retrieve multiple sequences, and you can code around it in any language.

ADD COMMENTlink written 14 months ago by Emily_Ensembl19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 919 users visited in the last hour