How to retrieve whole genome sequences by GenBank Ids
1
0
Entering edit mode
7.1 years ago
Charles Yin ▴ 180

I found that Biostars is very helpful!. The following new question puzzles me these days. I can see the genome features and its sequences from NCBI benbank (link) for access Id: DS264095, but Entriz could not retrieve the sequence. The output of the following code shows all Ns in sequence, but with the length that matches the size of the genome (1030563bp). I retrieve the corresponding gbk file, the gbk file just contains the features and CONTIGs, without actual genome sequence. Would you have any suggestions? Thank you!

from Bio import SeqIO
from Bio import Entrez

#https://www.ncbi.nlm.nih.gov/nuccore/147747968?report=genbank
handle = Entrez.efetch(db='nuccore', rettype='gb', id='DS264095',retmode='text')
for seqRecord in SeqIO.parse(handle, 'genbank'):
    seq=seqRecord.seq
    print('seq:',seq[0:100])
    print('len:',len(seq))
#The outputs:
seq:
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

len: 1030563

The retrieved gbk file is as follows, which has different content than the data shown in NCBI site.

LOCUS       DS264095             1030563 bp    DNA     linear   CON 18-MAY-2007
DEFINITION  Burkholderia mallei FMH scf_1099471655815 genomic scaffold, whole
            genome shotgun sequence.
ACCESSION   DS264095 AAIQ02000000
VERSION     DS264095.1
DBLINK      BioProject: PRJNA13987
            BioSample: SAMN02435848
KEYWORDS    WGS.
SOURCE      Burkholderia mallei FMH
  ORGANISM  Burkholderia mallei FMH
            Bacteria; Proteobacteria; Betaproteobacteria; Burkholderiales;
            Burkholderiaceae; Burkholderia; pseudomallei group.
REFERENCE   1  (bases 1 to 1030563)
  AUTHORS   DeShazer,D., Woods,D.E. and Nierman,W.C.
  TITLE     Direct Submission
  JOURNAL   Submitted (06-MAR-2007) The Institute for Genomic Research, 9712
            Medical Center Drive, Rockville, MD 20850, USA
FEATURES             Location/Qualifiers
     source          1..1030563
                     /organism="Burkholderia mallei FMH"
                     /mol_type="genomic DNA"
                     /strain="FMH"
                     /db_xref="taxon:334802"
CONTIG      join(AAIQ02000135.1:1..8136,gap(457),AAIQ02000043.1:1..44311,
            gap(1031),AAIQ02000166.1:1..3120,gap(1729),AAIQ02000194.1:1..761,
            gap(36),AAIQ02000039.1:1..45955,gap(118),AAIQ02000068.1:1..28166,
            gap(192),AAIQ02000195.1:1..749,gap(685),AAIQ02000204.1:1..289,
            gap(154),AAIQ02000163.1:1..3538,gap(375),AAIQ02000123.1:1..10313,
            gap(588),AAIQ02000142.1:1..7218,gap(466),AAIQ02000021.1:1..68386,
            gap(239),AAIQ02000069.1:1..27890,gap(395),AAIQ02000099.1:1..17802,
            gap(481),AAIQ02000038.1:1..45969,gap(717),AAIQ02000152.1:1..6039,
            gap(100),AAIQ02000162.1:1..3813,gap(349),AAIQ02000130.1:1..9302,
            gap(951),AAIQ02000104.1:1..15966,gap(744),AAIQ02000082.1:1..23397,
            gap(2853),AAIQ02000178.1:1..2005,gap(36),AAIQ02000189.1:1..1160,
            gap(36),AAIQ02000120.1:1..10635,gap(36),AAIQ02000184.1:1..1724,
            gap(489),AAIQ02000121.1:1..10540,gap(720),AAIQ02000055.1:1..34907,
            gap(378),AAIQ02000117.1:1..11883,gap(254),AAIQ02000033.1:1..54313,
            gap(288),AAIQ02000137.1:1..7858,gap(863),AAIQ02000115.1:1..12452,
            gap(592),AAIQ02000009.1:1..106604,gap(722),AAIQ02000149.1:1..6242,
            gap(593),AAIQ02000186.1:1..1381,gap(36),AAIQ02000169.1:1..2881,
            gap(468),AAIQ02000148.1:1..6247,gap(437),AAIQ02000164.1:1..3492,
            gap(464),AAIQ02000126.1:1..10017,gap(636),AAIQ02000141.1:1..7280,
            gap(731),AAIQ02000174.1:1..2399,gap(36),AAIQ02000173.1:1..2519,
            gap(246),AAIQ02000013.1:1..98333,gap(237),AAIQ02000168.1:1..2885,
            gap(278),AAIQ02000106.1:1..15267,gap(583),AAIQ02000177.1:1..2034,
            gap(495),AAIQ02000183.1:1..1748,gap(804),AAIQ02000046.1:1..41423,
            gap(357),AAIQ02000167.1:1..3108,gap(36),AAIQ02000171.1:1..2650,
            gap(36),AAIQ02000087.1:1..22096,gap(728),AAIQ02000199.1:1..476,
            gap(199),AAIQ02000180.1:1..1969,gap(36),AAIQ02000205.1:1..262,
            gap(262),AAIQ02000129.1:1..9762,gap(590),AAIQ02000160.1:1..4201,
            gap(473),AAIQ02000150.1:1..6226,gap(1027),AAIQ02000176.1:1..2096,
            gap(279),AAIQ02000032.1:1..57068,gap(491),AAIQ02000094.1:1..18896,
            gap(669),AAIQ02000058.1:1..33027,gap(36),AAIQ02000201.1:1..386,
            gap(625),AAIQ02000125.1:1..10029)
//
sequence • 2.2k views
ADD COMMENT
0
Entering edit mode

Please format your post appropriately in the future.

ADD REPLY
0
Entering edit mode

Sure, I will make sure the future posts are well formatted. Thanks Ram.

ADD REPLY
1
Entering edit mode
7.1 years ago
GenoMax 141k

esearch -db nuccore -query "DS264095" | efetch -format fasta returns correct sequence.

Perhaps try gbwithparts instead of just gb (which gets the scaffold record you posted above).

ADD COMMENT
0
Entering edit mode

yes, it works when using 'gbwithparts' as the value for rettype. The new following code returns full genome sequence.

handle = Entrez.efetch(db='nuccore', rettype='gbwithparts', id='DS264095',retmode='text')

ADD REPLY

Login before adding your answer.

Traffic: 1971 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6