Question: How can I obtain specimen_voucher information for each accession number in a python dictionary prepared with biopython?
0
gravatar for sw.knudsen
15 months ago by
sw.knudsen0
European Union
sw.knudsen0 wrote:

With this piece of code in python using the module biopython:

from Bio import Entrez
Entrez.email = "email@any_email.com"

organisms=["Diaphus anderseni"]
genes = ["H3"]
specs = {}
acclist = {}
for org in organisms:
    for gene in genes:
        query= org+"[organism] AND "+gene+"[gene]"
        res = Entrez.esearch(db="nucleotide", term=query, retmax=10000)
        rec = Entrez.read(res)
        res = Entrez.efetch(db="nucleotide", id=rec["IdList"],  retmode = "xml")
        for record in Entrez.read(res):
            speciesName = record["GBSeq_organism"]
            accn = record["GBSeq_accession-version"]
            if accn in acclist:
                acclist[accn].append(speciesName)
            else:
                acclist[accn] = [speciesName]

I get a dictionary with two entries like this:

{'KJ555688.1': ['Diaphus anderseni'], 'KJ555689.1': ['Diaphus anderseni']}

But I would also like to prepare a dictionary that has the 'specimen_voucher' information , so it looks like this:

{'KJ555688.1': ['SIO:10-169'], 'KJ555689.1': ['SIO:10-170']}

I prepared this last dictionary manually, by looking up the complete GenBank record for KJ555688 and KJ555689. But I would like to be able to do it in python, to do it on a grander scale with hundreds of accession numbers. Any advice on this would be greatly appreciated. Thanks in advance for your time and help.

ADD COMMENTlink modified 26 days ago by SMK1.4k • written 15 months ago by sw.knudsen0
0
gravatar for SMK
26 days ago by
SMK1.4k
Ghent, Belgium
SMK1.4k wrote:

Hi sw.knudsen,

specimen_voucher will be record["GBSeq_feature-table"][0]["GBFeature_quals"][3]["GBQualifier_value"]:

$ cat test.py
from Bio import Entrez
Entrez.email = "email@any_email.com"

organisms=["Diaphus anderseni"]
genes = ["H3"]
specs = {}
acclist = {}
for org in organisms:
    for gene in genes:
        query= org+"[organism] AND "+gene+"[gene]"
        res = Entrez.esearch(db="nucleotide", term=query, retmax=10000)
        rec = Entrez.read(res)
        res = Entrez.efetch(db="nucleotide", id=rec["IdList"],  retmode = "xml")
        for record in Entrez.read(res):
            speciesName = record["GBSeq_organism"]
            accn = record["GBSeq_accession-version"]
            specimenVoucher = record["GBSeq_feature-table"][0]["GBFeature_quals"][3]["GBQualifier_value"]
            if accn in acclist:
                acclist[accn].append(specimenVoucher)
            else:
                acclist[accn] = [specimenVoucher]

print(acclist)

$ python test.py
{'KJ555689.1': ['SIO:10-170'], 'KJ555688.1': ['SIO:10-169']}
ADD COMMENTlink modified 26 days ago • written 26 days ago by SMK1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1086 users visited in the last hour