Entering edit mode
4.1 years ago
Molinia
▴
10
Hello everyone ! I have a problem with my script to convert genbank file to rnt and ptt file I get inspired by an other post Question : Problem combining ptt, rnt files and I've made some adjustements.
The problem is that when I launch the script for example on NC_008253 it writes that there is no length, no genes and so on.... whereas when I'm printing the value for example the gene 'f.qualifiers['gene'][0]' some are printed and then a KeyError is raised... That's why I tried something on my script to bypass the error
Here it is :
#!/usr/bin/python3
import re
from Bio import SeqIO
annotation_file = "NC_004431.gb" # .gbk file
rnt_file = "NC_004431.rnt" # .rnt file
ptt_file = "NC_004431.ptt" # .ptt file
r = SeqIO.parse(annotation_file, "gb")
fasta_file = "NC_004431.fna"
records = []
with open(annotation_file) as inpf:
for rec in SeqIO.parse(inpf, 'genbank'):
records.append(rec)
SeqIO.write(records, fasta_file, 'fasta')
# RNT FILE DEF
for record in r:
fasta_file = open(fasta_file, "a")
SeqIO.write(record, fasta_file, "fasta")
fasta_file.close()
record.features = [f for f in record.features if f.type == "rRNA" or f.type == "tRNA"]
fout = open(rnt_file, "a")
fout.write("{0} - 1..{1}\n".format(record.description, len(record)))
fout.write("{0} RNAs\n".format(len(record.features)))
fout.write("Location\tStrand\tLength\tPID\tGene\tSynonym\tCode\tCOG\tProduct\n")
strand = {1: '+', -1: '-'}
for f in record.features:
try:
#print(f.qualifiers['gene'])
fout.write( "{0}\n".format("\t".join([str(abs(f.location.start + 1)) + ".." + str(f.location.end),strrand[f.strand], str(abs(f.location.start - f.location.end)), str(record.annotations['gi']), f.qualifiers['gene'][0], f.qualifiers["locus_tag"][0], '-', '-', f.qualifiers["product"][0]])))
except KeyError:
fout.write("{0}\n".format("\t".join([str(abs(f.location.start + 1)) + ".." + str(f.location.end), strand[f.strand], str(abs(f.location.start - f.location.end)), '-', '-', f.qualifiers["locus_tag"][0], '-', '-', f.qualifiers["product"][0]])))
fout.close()
# PTT FILE
r = SeqIO.parse(annotation_file, "gb")
for record in r:
record.features = [f for f in record.features if f.type == "CDS"]
fout = open(ptt_file, "a")
fout.write("{0} - 1..{1}\n".format(record.description, len(record)))
fout.write("{0} proteins\n".format(len(record.features)))
fout.write("Location\tStrand\tLength\tPID\tGene\tSynonym\tCode\tCOG\tProduct\n")
for f in record.features:
strand = str(f.strand)
try:
product = f.qualifiers["product"][0]
PID = re.sub('[GI:]', "", str(f.qualifiers['db_xref'][0]))
translation = f.qualifiers['translation'][0]
for element in translation:
fout.write("{0}\n".format("\t".join([str(abs(f.location.start + 1)) + ".." + str(f.location.end), strand, str(len(element)), PID, f.qualifiers['gene'][0], f.qualifiers["locus_tag"][0], "-", "-", product])))
except KeyError:
PID = '-'
gene = '-'
product = '-'
fout.write("{0}\n".format("\t".join([str(abs(f.location.start + 1)) + ".." + str(f.location.end), strand,"-", PID, gene, f.qualifiers["locus_tag"][0], "-", "-", product])))
fout.close()
I've made an error somewhere but I can't find
Thanks in advance !!
This may not be what you are looking for since it doesn't address the problem with your script, but there are BioPython and BioPerl solutions available.
https://github.com/HenryWiersma/gbkToPttAndRnt
https://github.com/sgivan/gb2ptt
https://github.com/ajvilleg/gbk2ptt
Thanks for the answer but I have already checked this script
The problem is that it only write the 'except KeyERROR' part