Question: How to retrieve whole genome sequences by GenBank Ids
0
gravatar for genebow
2.0 years ago by
genebow150
USA/Chicago
genebow150 wrote:

I found that Biostars is very helpful!. The following new question puzzles me these days. I can see the genome features and its sequences from NCBI benbank (link) for access Id: DS264095, but Entriz could not retrieve the sequence. The output of the following code shows all Ns in sequence, but with the length that matches the size of the genome (1030563bp). I retrieve the corresponding gbk file, the gbk file just contains the features and CONTIGs, without actual genome sequence. Would you have any suggestions? Thank you!

from Bio import SeqIO
from Bio import Entrez

#https://www.ncbi.nlm.nih.gov/nuccore/147747968?report=genbank
handle = Entrez.efetch(db='nuccore', rettype='gb', id='DS264095',retmode='text')
for seqRecord in SeqIO.parse(handle, 'genbank'):
    seq=seqRecord.seq
    print('seq:',seq[0:100])
    print('len:',len(seq))
#The outputs:
seq:
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

len: 1030563

The retrieved gbk file is as follows, which has different content than the data shown in NCBI site.

LOCUS       DS264095             1030563 bp    DNA     linear   CON 18-MAY-2007
DEFINITION  Burkholderia mallei FMH scf_1099471655815 genomic scaffold, whole
            genome shotgun sequence.
ACCESSION   DS264095 AAIQ02000000
VERSION     DS264095.1
DBLINK      BioProject: PRJNA13987
            BioSample: SAMN02435848
KEYWORDS    WGS.
SOURCE      Burkholderia mallei FMH
  ORGANISM  Burkholderia mallei FMH
            Bacteria; Proteobacteria; Betaproteobacteria; Burkholderiales;
            Burkholderiaceae; Burkholderia; pseudomallei group.
REFERENCE   1  (bases 1 to 1030563)
  AUTHORS   DeShazer,D., Woods,D.E. and Nierman,W.C.
  TITLE     Direct Submission
  JOURNAL   Submitted (06-MAR-2007) The Institute for Genomic Research, 9712
            Medical Center Drive, Rockville, MD 20850, USA
FEATURES             Location/Qualifiers
     source          1..1030563
                     /organism="Burkholderia mallei FMH"
                     /mol_type="genomic DNA"
                     /strain="FMH"
                     /db_xref="taxon:334802"
CONTIG      join(AAIQ02000135.1:1..8136,gap(457),AAIQ02000043.1:1..44311,
            gap(1031),AAIQ02000166.1:1..3120,gap(1729),AAIQ02000194.1:1..761,
            gap(36),AAIQ02000039.1:1..45955,gap(118),AAIQ02000068.1:1..28166,
            gap(192),AAIQ02000195.1:1..749,gap(685),AAIQ02000204.1:1..289,
            gap(154),AAIQ02000163.1:1..3538,gap(375),AAIQ02000123.1:1..10313,
            gap(588),AAIQ02000142.1:1..7218,gap(466),AAIQ02000021.1:1..68386,
            gap(239),AAIQ02000069.1:1..27890,gap(395),AAIQ02000099.1:1..17802,
            gap(481),AAIQ02000038.1:1..45969,gap(717),AAIQ02000152.1:1..6039,
            gap(100),AAIQ02000162.1:1..3813,gap(349),AAIQ02000130.1:1..9302,
            gap(951),AAIQ02000104.1:1..15966,gap(744),AAIQ02000082.1:1..23397,
            gap(2853),AAIQ02000178.1:1..2005,gap(36),AAIQ02000189.1:1..1160,
            gap(36),AAIQ02000120.1:1..10635,gap(36),AAIQ02000184.1:1..1724,
            gap(489),AAIQ02000121.1:1..10540,gap(720),AAIQ02000055.1:1..34907,
            gap(378),AAIQ02000117.1:1..11883,gap(254),AAIQ02000033.1:1..54313,
            gap(288),AAIQ02000137.1:1..7858,gap(863),AAIQ02000115.1:1..12452,
            gap(592),AAIQ02000009.1:1..106604,gap(722),AAIQ02000149.1:1..6242,
            gap(593),AAIQ02000186.1:1..1381,gap(36),AAIQ02000169.1:1..2881,
            gap(468),AAIQ02000148.1:1..6247,gap(437),AAIQ02000164.1:1..3492,
            gap(464),AAIQ02000126.1:1..10017,gap(636),AAIQ02000141.1:1..7280,
            gap(731),AAIQ02000174.1:1..2399,gap(36),AAIQ02000173.1:1..2519,
            gap(246),AAIQ02000013.1:1..98333,gap(237),AAIQ02000168.1:1..2885,
            gap(278),AAIQ02000106.1:1..15267,gap(583),AAIQ02000177.1:1..2034,
            gap(495),AAIQ02000183.1:1..1748,gap(804),AAIQ02000046.1:1..41423,
            gap(357),AAIQ02000167.1:1..3108,gap(36),AAIQ02000171.1:1..2650,
            gap(36),AAIQ02000087.1:1..22096,gap(728),AAIQ02000199.1:1..476,
            gap(199),AAIQ02000180.1:1..1969,gap(36),AAIQ02000205.1:1..262,
            gap(262),AAIQ02000129.1:1..9762,gap(590),AAIQ02000160.1:1..4201,
            gap(473),AAIQ02000150.1:1..6226,gap(1027),AAIQ02000176.1:1..2096,
            gap(279),AAIQ02000032.1:1..57068,gap(491),AAIQ02000094.1:1..18896,
            gap(669),AAIQ02000058.1:1..33027,gap(36),AAIQ02000201.1:1..386,
            gap(625),AAIQ02000125.1:1..10029)
//
sequence • 895 views
ADD COMMENTlink modified 2.0 years ago by RamRS20k • written 2.0 years ago by genebow150

Please format your post appropriately in the future.

ADD REPLYlink written 2.0 years ago by RamRS20k

Sure, I will make sure the future posts are well formatted. Thanks Ram.

ADD REPLYlink written 2.0 years ago by genebow150
1
gravatar for genomax
2.0 years ago by
genomax63k
United States
genomax63k wrote:

esearch -db nuccore -query "DS264095" | efetch -format fasta returns correct sequence.

Perhaps try gbwithparts instead of just gb (which gets the scaffold record you posted above).

ADD COMMENTlink written 2.0 years ago by genomax63k

yes, it works when using 'gbwithparts' as the value for rettype. The new following code returns full genome sequence.

handle = Entrez.efetch(db='nuccore', rettype='gbwithparts', id='DS264095',retmode='text')

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by genebow150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1884 users visited in the last hour