Retrieving sequences from Ensembl Archive
1
0
Entering edit mode
3.3 years ago
biostR • 0

Hi,

I am looking for a way to retrieve DNA sequences from Ensembl May 2017 archive, based on coordinates. I thought using Biomart package would be useful for getting DNA sequences, however, it did not work. Apparently, sequence type (seqType, type) is required for obtaining a sequence using getSequence function.

For example:

  ensembl<-useMart(host="may2017.archive.ensembl.org",
                     biomart="ENSEMBL_MART_ENSEMBL",
                     dataset="hsapiens_gene_ensembl")    
    seq<-biomaRt::getSequence(chromosome="X", start =  100639991, end = 100644991 , mart=ensembl )

This gives the following error:

Error in biomaRt::getSequence(chromosome = "X", start = 100639991, end = 100644991,  : 
  Please specify the type of sequence that needs to be retrieved when using biomaRt in web service mode.  Choose either gene_exon, transcript_exon,transcript_exon_intron, gene_exon_intron, cdna, coding,coding_transcript_flank,coding_gene_flank,transcript_flank,gene_flank,peptide, 3utr or 5utr

Is there a nice way for getting the DNA sequences of a large list of genomic coordinates?

Thank you very much.

DNA sequence Ensembl Ensembl archive • 1.3k views
ADD COMMENT
0
Entering edit mode

Did you check the documentation? Sequence type genomic is one of the allowed options.

ADD REPLY
0
Entering edit mode

biomaRt v2.32.1 is installed which does not allow "genomic" as the seqType. If I try I get the following:

Error in biomaRt::getSequence(chromosome = "X", start = 100639991, end = 100644991,  : 
Please specify the type of sequence that needs to be retrieved when using biomaRt in web service mode. 
Choose either gene_exon, transcript_exon,transcript_exon_intron,
gene_exon_intron, cdna, coding,coding_transcript_flank,coding_gene_flank,transcript_flank,
gene_flank,peptide, 3utr or 5utr
ADD REPLY
0
Entering edit mode

Tagging: Mike Smith to see if he can help.

ADD REPLY
0
Entering edit mode

thanks Emily this was useful,
how I can retrieve archive sequences from older rat assemblie

the code below work, however it retrieve sequences from rno6 (latest rat genome) what i need is rno4 , which is located in the ensemble archive here http://may2012.archive.ensembl.org if I change the server address it gives me errore ! any suggestion ?

import requests, sys

server = "http://may2017.rest.ensembl.org"
ext = "/sequence/region/rat/1:34592855..34676565:1?"

r = requests.get(server+ext, headers={ "Content-Type" : "application/json"})

if not r.ok:
  r.raise_for_status()
  sys.exit()

decoded = r.json()
print(repr(decoded))
ADD REPLY
0
Entering edit mode

Unfortunately we don't have REST archives that old. I also checked the our remapping tools and we don't have mapping between RGSC3.4 and Rnor_6.0.

ADD REPLY
5
Entering edit mode
3.3 years ago

BioMart is gene-centric, it cannot get sequences of genomic regions. The easiest way to get what you need is using the REST API archive with the POST sequence/region endpoint. This will allow you to retrieve multiple sequences, and you can code around it in any language.

ADD COMMENT

Login before adding your answer.

Traffic: 1681 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6