Howto transfer all 'product' qualifiers from a genbank file to another one
1
0
Entering edit mode
11 months ago
a99942 • 0

Hi,

I have 2 genbank files of almost identical genome assemblies with the same number of predicted genes with similar AA sequences, one genome per file:

file1: with assembly gaps but manually curated annotation, i.e. custom 'product' qualifiers

 CDS             complement(551913..552146)
/locus_tag="KLOVNAE_00475"
/inference="ab initio prediction:Prodigal:002006"
/codon_start=1
/product="Ubiquitin-like homologous superfamily
domain-containing protein"
...


file2: a correct assembly with automated prokka annotation only, that has some other qualifiers I would like to keep in the final genbank file, i. e. 'gene', 'inference' ...

 CDS             complement(551913..552146)
/gene="vUbi_1"
/locus_tag="KLOVNAE_00475"
/inference="ab initio prediction:Prodigal:002006"
/inference="similar to AA sequence:UniProtKB:P16709"
/codon_start=1
/product="viral Ubiquitin"
...


Therefore I would like to replace the 'product' qualifier of all CDS features in file2 with the respective ones from the manually curated file 1.

How could I achieve that eg. in Biopython? Hints/Help would be highly appreciated!

thx atb flo

genbank biopython • 350 views
0
Entering edit mode

do both files use the same geneIDs ? (== can you link on gene name or such?)

0
Entering edit mode

all genes of both files have the same locus_tags: KLOVNAE_00001 ... 01000

I have included examples of the CDS features...

2
Entering edit mode
11 months ago
Joe 19k

This can be achieved with biopython I think, though may not be super straightforward. I don't have time at the moment to throw together actual code, but off the top of my head this is the approach you could use (assume the features correspond very well (identical order, locus tags etc).

from Bio import SeqIO

source_gb = SeqIO.read("genbank1.gbk", "genbank")   # Note: 'read' assumes each file isn't a multi-genbank else this gets more complicated

#iterate the list of features in each genbank:
for source_feat, dest_feat in zip((s_feature for s_feature in source_gb.features), (d_feature for d_feature in dest.features)):
#sub the relevant qualified from one record in to the other. This is where I forget the exact syntax, but something like:
dest_feat.qualifiers['product'] = source_feat.qualifiers['product']


You might want to add extra checks such as source_feat.qualifiers['locus_tag'] == dest_feat.qualifiers['locus_tag'] or something.

0
Entering edit mode

it worked after checking for the presence of 'product'! since there were 2 repeat regions annotated a crispr regions without the qualifier 'product'... many thx!

from Bio import SeqIO

#iterate the list of features in each genbank:
for source_feat, dest_feat in zip((s_feature for s_feature in source_gb.features), (d_feature for d_feature in dest_gb.features)):
if (source_feat.qualifiers.get("product")):
dest_feat.qualifiers["product"] = source_feat.qualifiers["product"]

SeqIO.write(dest_gb, "dest.gbf", "genbank")

0
Entering edit mode

moved to an answer so you can go ahead and accept if you like.