Gff Biopython Parse Issue
1
0
Entering edit mode
12.1 years ago
Zach Powers ▴ 340

Hello Biostar,

I am writing a script for a very simple annotation pipeline and have run into trouble with GFF parsing.

I am using Brad Chapman's script to convert the GFF output from Prodigal into Genbank format. While I can use the script by calling it, I am running into trouble using the same functions within my own script. Here is my script which is identical to Brad's except for the parts shown here:

def main(input_file):
    base, ext        = os.path.splitext(input_file)
    run_prodigal(input_file)

def run_prodigal(fasta_in):
    """
    Writes out Protein Fasta and GBFiles from Prodigal 
    """
    base, ext        = os.path.splitext(fasta_in)
    gff_out          = "{}.gff".format(base)
    proteinfasta_out = "{}_proteins.fasta".format(base)
    gb_out  = "{}.gb".format(base)
    command =  "prodigal -i {} -p m -a {} -o {} -f gff".format(fasta_in, proteinfasta_out, gff_out)
    subprocess.call(command.split())
    #print "Finding Genes for {}".format(base)
    #print "Writing GB File, GFF, and Protein fasta for {}".format(base)
    fasta_input = SeqIO.to_dict(SeqIO.parse(fasta_in, "fasta", generic_dna))
    gff_iter = GFF.parse(gff_out, fasta_in)
    #print fasta_input
    #print gff_iter
    SeqIO.write(_check_gff(_fix_ncbi_id(gff_iter)), gb_out, "genbank")

If I call this I get the following error message:

#call the script
python run_prodigal.py contigs.fasta

#and results in this error message
   for rec in fasta_iter:
  File "build/bdist.macosx-10.6-x86_64/egg/BCBio/GFF/GFFParser.py", line 709, in parse
  File "build/bdist.macosx-10.6-x86_64/egg/BCBio/GFF/GFFParser.py", line 304, in parse_in_parts
  File "build/bdist.macosx-10.6-x86_64/egg/BCBio/GFF/GFFParser.py", line 344, in _results_to_features
  File "build/bdist.macosx-10.6-x86_64/egg/BCBio/GFF/GFFParser.py", line 400, in _add_parent_child_features
  File "build/bdist.macosx-10.6-x86_64/egg/BCBio/GFF/GFFParser.py", line 510, in _add_toplevel_feature
  File "build/bdist.macosx-10.6-x86_64/egg/BCBio/GFF/GFFParser.py", line 479, in _get_rec
TypeError: string indices must be integers, not str

#however I can call the gff_to_genbank.py on using the original fasta and the prodigal generated gff and it works fine
python gff_to_genbank.py contigs.gb contigs.fasta

Although I can work around it, I am not sure why I am getting the error message and would appreciate your thoughts on why this is happening.

thanks, zach cp

biopython gff • 5.2k views
ADD COMMENT
3
Entering edit mode
12.1 years ago

You have a small typo here:

fasta_input = SeqIO.to_dict(SeqIO.parse(fasta_in, "fasta", generic_dna))
gff_iter = GFF.parse(gff_out, fasta_in)

The second line should be:

gff_iter = GFF.parse(gff_out, fasta_input)

You are passing the input handle to GFF parse instead of the dictionary it is expecting. Hope this gets things working for you.

ADD COMMENT
0
Entering edit mode

thanks brad, you are the man.

ADD REPLY

Login before adding your answer.

Traffic: 1332 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6