Question: Using ape package in R to pull genbank sequences
gravatar for krc3004
2.7 years ago by
krc300420 wrote:

Hi all- I am trying use the ape package in R to pull protein sequences from GenBank. The sequence I'd like to retrieve is the following:

To do this I used the following code:

kl_seq = read.GenBank("AAG26087.1", as.character = TRUE)

And the sequence I obtained was:

  [1] "m" "s" "r" "s" "s" "k" "r" "n" "r" "d" "g" "r" "g" "w" "v" "n" "g" "r" "k" "k" "d" "r" "d" "r" "k" "k" "k" "k" "k" "k" "d" "n" "w" "g" "n" "k" "g" "g" "k" "k" "d"
 [42] "k" "d" "g" "g" "a" "a" "k" "r" "a" "r" "t" "d" "m" "d" "s" "g" "g" "k" "r" "r" "g" "g" "s" "d" "k" "r" "d" "h" "r" "r" "r" "k" "a" "n" "k" "r" "k" "a" "a" "g" "g"
 [83] "k" "h" "s" "k" "k" "r" "t" "d" "r" "r" "r" "r" "t" "a" "g" "s" "v" "g" "g" "v" "n" "g" "g" "s" "r" "g" "a" "g" "g" "g" "v" "n" "m" "s" "v" "s" "s" "r" "t" "g" "g"
[124] "d" "v" "r" "g" "n" "g" "w" "d" "a" "d" "s" "s" "c" "r"

[1] "Hepatitis_delta_virus"

However, this sequence clearly does not match the sequence of length 214 aa in the above link. I'm wondering if there's something I'm missing here- perhaps there is another argument, function, or package I should use to pull the correct sequence? Is there any way to bulk download from GenBank directly, given a list of accession IDs? Any tips would be much appreciated. Thanks!

ape genbank R ncbi • 1.9k views
ADD COMMENTlink modified 2.7 years ago by Kevin Blighe69k • written 2.7 years ago by krc300420

Use NCBI's unix utilities.

efetch -db nuccore -id "AAG26087.1" -format fasta > AAG26087.fa

Replace ID's in a loop if you have multiple ID's to download

ADD REPLYlink written 2.7 years ago by GenoMax94k

genomax, thanks very much for your help- I will try this. Is there any way to use efetch to get the nucleotide sequence as well?

ADD REPLYlink written 2.7 years ago by krc300420

Just use the right accession number and it will return nucleotide sequence.

ADD REPLYlink written 2.7 years ago by GenoMax94k
gravatar for Kevin Blighe
2.7 years ago by
Kevin Blighe69k
Republic of Ireland
Kevin Blighe69k wrote:

That's weird. I looked at the function souce code and it executes only 2 possible functions:

URL <- paste0("", 
            paste(access.nb[a:b], collapse = ","), "&rettype=fasta&retmode=text")

URL <- paste("", 
                paste(access.nb[a:b], collapse = ","), "&rettype=gb&retmode=text")

I'm not sure about the paste(access.nb[a:b], collapse = ",") part; however, if you just go to these URLs in your web browser, you then see the expected sequence. For example:

This produces:

>AAG26087.1 large delta antigen [Hepatitis delta virus]

The other command:

This just return the webpage (in plain text) at


There are ways of automating this in Python but, if you want to stay within R, you can use the httr package. Here is a very messy example, but I'll leave the tidying up to you:

response <- GET("")
content(r, "text")

No encoding supplied: defaulting to UTF-8.

You can either mess around with that messy format including the end-line characters, or you can 'cheat' your way and get this into a data-frame as follows:

write(content(r, "text"), "test.fasta")
4                                                                   CRPQ

What you want to do will depend on your end gal. You could easily convert this into a function that accepts a GenBank accession ID and then loop it.


ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Kevin Blighe69k

Thank you Kevin! This solution works on my end but it is indeed messy, as you suggest...not sure why the ape package is returning the incorrect sequence. A further question is how to obtain the nucleotide sequence for these accessions but perhaps I can work that out separately...thanks again!

ADD REPLYlink written 2.7 years ago by krc300420

Yes, as to what is happening in the ape function should probably be directed to the developers. It's an otherwise extremely good package. Best of luck.

ADD REPLYlink written 2.7 years ago by Kevin Blighe69k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1065 users visited in the last hour