parsing genbank file
1
0
Entering edit mode
5.2 years ago

Hi

I am trying to parse a genbank file. I am using python 2.7 and biopython 1.73.

Below is the first entry in my file. The information I would like to save to a new file is: Accession, Organism, kpc gene and its translation

I would like to save the same info from all the records in my file.

Thanks to all in advance who might be able to help.

LOCUS       MH558576               11275 bp    DNA     linear   BCT 03-SEP-2018
DEFINITION  Klebsiella pneumoniae strain KP21-KPC plasmid, partial sequence.
ACCESSION   MH558576
VERSION     MH558576.1
KEYWORDS    .
SOURCE      Klebsiella pneumoniae
  ORGANISM  Klebsiella pneumoniae
            Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacterales;
            Enterobacteriaceae; Klebsiella.
REFERENCE   1  (bases 1 to 11275)
  AUTHORS   Wang,P., Hu,Y., Yi,G., Shen,X., Wang,Z., Ma,R., Shan,B. and Wang,Y.
  TITLE     Clone dissemination of blaKPC-2 and blaNDM-1 co-producing clinical
            isolates of Klebsiella pneumoniae in a Chinese teaching hospital
  JOURNAL   Unpublished
REFERENCE   2  (bases 1 to 11275)
  AUTHORS   Wang,P., Hu,Y., Yi,G., Shen,X., Wang,Z., Ma,R., Shan,B. and Wang,Y.
  TITLE     Direct Submission
  JOURNAL   Submitted (02-JUL-2018) Department of Key Laboratory, The 2nd
            Affiliated Hospital of Kunming Medical University, 374 Dian Mian
            Road, Kunming, Yunnan 650101, China
COMMENT     ##Assembly-Data-START##
            Sequencing Technology :: Sanger dideoxy sequencing
            ##Assembly-Data-END##
FEATURES             Location/Qualifiers
     source          1..11275
                     /organism="Klebsiella pneumoniae"
                     /mol_type="genomic DNA"
                     /strain="KP21-KPC"
                     /isolation_source="urine"
                     /db_xref="taxon:573"
                     /plasmid="unnamed"
                     /country="China: Kunming,YN"
                     /collection_date="2010"
     CDS             43..759
                     /codon_start=1
                     /transl_table=11
                     /product="IS6-like element IS26 family transposase"
                     /protein_id="AXS01185.1"
                     /translation="MELHMNPFKGRHFQRDIILWAVRWYCKYGISYRELQEMLAERGV
                     NVDHSTIYRWVQRYAPEMEKRLRWYWRNPSDLCPWHMDETYVKVNGRWAYLYRAVDSR
                     GRTVDFYLSSRRNSKAAYRFLGKILNNVKKWQIPRFINTDKAPAYGRALALLKREGRC
                     PSDVEHRQIKYRNNVIECDHGKLKRIIGATLGFKSMKTAYATIKGIEVMRALRKGQAS
                     AFYYGDPLGEMRLVSRVFEM"
     gene            862..1272
                     /gene="tnpR"
                     /note="truncated TnpR resolvase"
     CDS             1395..2375
                     /codon_start=1
                     /transl_table=11
                     /product="IS481-like element ISKpn27 family transposase"
                     /protein_id="AXS01186.1"
                     /translation="MTQALHSQARTTHLIREEIRNSTLPQAELARMYNVTRQTIRKWQ
                     NRESPEDKSHAPNKMYTTLTPEQELIVVELRKTLLLPTDDLLAVTREFINPAVSRAGL
                     GRCLRRHGVSDLRNLVEQEGTAPATKKTFKDYEPGFVHIDIKYLPQMPDETARRYLFV
                     AIDRATRWVFIELYADQTDGSSGDFLNKVQQACPVKIVKLLTDNGSQFTDRFTAGGKK
                     KEPSGTHVFDRLCKQLGIEHRLIPPRHPQTNGMVERFNGRISDIVNQTRFGSAAELES
                     TLRNYVKIYNHSIPQRALQHKTPVQALKEWHEKRPELFRKRVYNQPGLDI"
     gene            2651..3532
                     /gene="kpc"
                     /note="carbapenem-hydrolyzing class A beta-lactamase
                     KPC-2"
     CDS             2651..3532
                     /gene="kpc"
                     /codon_start=1
                     /transl_table=11
                     /product="carbapenem-hydrolyzing class A beta-lactamase
                     KPC-2"
                     /protein_id="AXS01187.1"
                     /translation="MSLYRRLVLLSCLSWPLAGFSATALTNLVAEPFAKLEQDFGGSI
                     GVYAMDTGSGATVSYRAEERFPLCSSFKGFLAAAVLARSQQQAGLLDTPIRYGKNALV
                     PWSPISEKYLTTGMTVAELSAAAVQYSDNAAANLLLKELGGPAGLTAFMRSIGDTTFR
                     LDRWELELNSAIPGDARDTSSPRAVTESLQKLTLGSALAAPQRQQFVDWLKGNTTGNH
                     RIRAAVPADWAVGDKTGTCGVYGTANDYAVVWPTGRAPIVLAVYTRAPNKDDKHSEAV
                     IAAAARLALEGLGVNGQ"
     gene            complement(4767..5063)
                     /gene="korC"
                     /note="transcriptional repressor protein KorC"
     CDS             complement(4767..5063)
                     /gene="korC"
                     /codon_start=1
                     /transl_table=11
                     /product="transcriptional repressor protein KorC"
                     /protein_id="AXS01188.1"
                     /translation="MIRPETLRPFAEDWQAPTADEIKEVLELIRQRKGLSKPLSGVDV
                     ADLVGLPGERGSGKGTRTFRRWVSKTNPSPIAYGAWSILAHLAGFGAIWDADRD"
     gene            complement(5392..5817)
                     /gene="klca"
                     /note="antirestriction protein"
     CDS             complement(5392..5817)
                     /gene="klca"
                     /codon_start=1
                     /transl_table=11
                     /product="antirestriction protein"
                     /protein_id="AXS01189.1"
                     /translation="MMQTELNPLICSLVATPRRMAAMPRYVGRFYVVFESMLYQQMKG
                     LCREYRGAYWLMWELSNGGFYMAPGRRDEMLNIEAMNYFSGQMSADAAGITACLYLYS
                     HLSFHTEGADQERFSRLYHSLRDWACEHDEKEAILAAID"
     CDS             complement(5928..6206)
                     /codon_start=1
                     /transl_table=11
                     /product="hypothetical protein"
                     /protein_id="AXS01190.1"
                     /translation="MIHTANRTFHQLYREWIRERREHMHNVLTWERDRYGARLVGLFY
                     RYCKVANPFPRCTLNTRINYRAHAVNLPDWPARSLELNKMWLSWREKK"
     CDS             7749..8309
                     /codon_start=1
                     /transl_table=11
                     /product="TnpR resolvase"
                     /protein_id="AXS01191.1"
                     /translation="MQGHRIGYVRVSSFDQNPERQLEQTQVSKVFTDKASGKDTQRPQ
                     LEALLSFVREGDTVVVHSMDRLARNLDDLRRLVQKLTQRGVRIEFLKEGLVFTGEDSP
                     MANLMLSVMGAFAEFERALIRERQREGIALAKQRGAYRGRKKALSDEQAATLRQRATA
                     GEPKAQLAREFNISRETLYQYLRTDD"
     CDS             8313..>11275
                     /codon_start=1
                     /transl_table=11
                     /product="Tn3-like element TnAs1 family transposase"
                     /protein_id="AXS01192.1"
                     /translation="MPRRLILSATERDTLLALPESQDDLIRYYTFNDSDLSLIRQRRG
                     DANRLGFAVQLCLLRYPGYALGTDSELPEPVILWVAKQVQAEPASWAKYGERDVTRRE
                     HAQELRTYLQLAPFGLSDFRALVRELTELAQQTDKGLLLAGQALESLRQKRRILPALS
                     VIDRACSEAIARANRRVYRALVEPLTDSHRAKLDELLKLKAGSSITWLTWLRQAPLKP
                     NSRHMLEHIERLKTFQLVDLPEGLGRHIHQNRLLKLAREGGQMTPKDLGKFEPQRRYA
                     TLAAVVLESTATVIDELVDLHDRILVKLFSGAKHKHQQQFQKQGKAINDKVRLYSRIG
                     QALLEAKESGSDPYAAIEAVIPWDEFTESVSEAELLARPEGFDHLHLVGENFATLRRY
                     TPALLEVLELRAAPAAQGVLAAVQTLREMNADNLRKVPADAPTAFIKPRWKPLVITPE
                     GLDRKFYEICALSELKNALRSGDIWVKGSRQFRDFDDYLLPAEKFAALKREQALPLAI
                     NPNSDQYLEERLQLLDEQLATVTRLAKDNELPDAILTESGLKITPLDAAVPDRAQALI
                     DQTSQLLPRIKITELLMDVDDWTGFSRHFTHLKDGAEAKDRTLLLSAILGDAINLGLT
                     KMAESSPGLTYAKLSWLQAWHIRDETYSAALAELVNHQYRHAFAAHWGDGTTSSSDGQ
                     RFRAGGRGESTGHVNPKYGSEPGRLFYTHISDQYAPFSTRVVNVGVRDSTYVLDGLLY
                     HESDLRIEEHYTDTAGFTDHVFALMHLLGFRFAPRIRDLGETKLYVPQGVQAYPTLRP
                     LIGGTLNIKHVRAHWDDILRLASSIKQGTVTASLMLRKLGSYPRQNGLAVALRELGRI
                     ERTLFILDWLQSVELRRRVHAGLNKGEARNSLARAVFFNRLGEIRDRSFEQQRYRASG
                     LNLVTAAIVLWNTVYLERATQGLVEAGKPVDGELLQFLSPLGWEHINLTGDYVWRQSR
                     RLEDGKFRPLRMPGKP"
ORIGIN      
        1 gcaaatagtc ggtggtgata aacttatcat ccccttttgc tgatggagct gcacatgaac
       61 ccattcaaag gccggcattt tcagcgtgac atcattctgt gggccgtacg ctggtactgc
      121 aaatacggca tcagttaccg tgagctgcag gagatgctgg ctgaacgcgg agtgaatgtc
      181 gatcactcca cgatttaccg ctgggttcag cgttatgcgc ctgaaatgga aaaacggctg
      241 cgctggtact ggcgtaaccc ttccgatctt tgcccgtggc acatggatga aacctacgtg
      301 aaggtcaatg gccgctgggc gtatctgtac cgggccgtcg acagccgggg ccgcactgtc
      361 gatttttatc tctcctcccg tcgtaacagc aaagctgcat accggtttct gggtaaaatc
sequence gene • 3.2k views
ADD COMMENT
2
Entering edit mode
5.2 years ago
GenoMax 142k

Here is the relevant section from Biopython tutorial about parsing Genbank records.

ADD COMMENT
0
Entering edit mode

Thanks. With the tutorial and some other info I found on-line I was able to get the following:

Which gives me:

from Bio import SeqIO
record= SeqIO.read("sequencetest.gb", "genbank")
my_gene=record.features[5]
print(seq_record.description)
rint(my_gene.qualifiers["gene"])
print(my_gene.qualifiers["product"])

The result:

Klebsiella pneumoniae strain KP21-KPC plasmid, partial sequence

['kpc']

['carbapenem-hydrolyzing class A beta-lactamase KPC-2']

['MSLYRRLVLLSCLSWPLAGFSATALTNLVAEPFAKLEQ....

This is exactly the information I want (except the brackets and quotations around everything), however I had to know that the feature I want was the 5th one. Given multiple records in a genbank file how can I get the feature i want by using a qualifier like the gene name?

ADD REPLY

Login before adding your answer.

Traffic: 1857 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6