Question: Obtaining sequence from Bioproject IDs using biopython gives unknown sequence
gravatar for Prasad
4.1 years ago by
Prasad 10
United States
Prasad 10 wrote:

Hi All,

I have a list of bioproject IDs and would like to get corresponding sequences from them. So, I am following a list of steps as below:

1. Using the bioproject ID, I am getting GI ID using elink:

handle = Entrez.elink(dbfrom="bioproject", db="nuccore",id=bioprojecID, linkname="bioproject_nuccore_wgsmaster")
record =
GI_ID = record[0]["LinkSetDb"]["Link"]["Id"]

2. Then I am trying to get sequence from GI_ID (using efetch and seqIO modules in biopython):

 handle = Entrez.efetch(db="nucleotide", id=GI_ID, rettype="gb", retmode="text")
 record =, "genbank")

But this gives unknown sequence when trying to print record. 

Can anyone advise if this is the right way to do it or is there a better way to obtain related sequences from bioproject IDs ?

Thanks in advance !

biopython elink eutilities efetch • 2.5k views
ADD COMMENTlink modified 4 months ago by Biostar ♦♦ 20 • written 4.1 years ago by Prasad 10
gravatar for Kirill
4.1 years ago by
Kirill260 wrote:

I can help with SeqIO part. Assuming that your "handle" is a genbank file.

from Bio import SeqIO

 for record in SeqIO.parse(open(handle), 'genbank'):

    print, record.seq


For more options do this:

print dir(record)


This will return a list of methods you can call on record object - that way you can get different information about your file (handle)

ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Kirill260

Hi, thanks for replying. I tried printing record.seq but it gives weird output (multiple 'N' characters).

ADD REPLYlink written 4.1 years ago by Prasad 10

It is very common to have multiple 'N' characters at the start of the sequence. Each chromosome may have multiple Ns at the start of the chromosome (could be 100 or 1000 of bases long). Scroll down into your sequence. 

ADD REPLYlink written 4.1 years ago by Kirill260
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 866 users visited in the last hour