Question: blast2gff IndexError: list index out of range
0
gravatar for Begonia_pavonina
5 months ago by
Begonia_pavonina20 wrote:

Hello everyone,

I have an issue with the mgkit tool: blast2gff. I am trying to output a GFF file from a BLAST result (with the -outfmt 6 format). Here are the command, then the error message:

blast2gff blastdb output_blast.out output_blast.gff

INFO - mgkit.workflow.blast2gff: Writing to file (output_blast.gff)
INFO - mgkit.io.blast: Reading BLAST results from file (output_blast.out)
Traceback (most recent call last):
  File "/cluster/home/usr/.conda/envs/mgkit/bin/blast2gff", line 11, in <module>
    sys.exit(main())
  File "/cluster/home/usr/.conda/envs/mgkit/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/cluster/home/usr/.conda/envs/mgkit/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/cluster/home/usr/.conda/envs/mgkit/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/cluster/home/usr/.conda/envs/mgkit/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/cluster/home/usr/.conda/envs/mgkit/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/cluster/home/usr/.conda/envs/mgkit/lib/python3.7/site-packages/mgkit/workflow/blast2gff.py", line 207, in convert_from_blastdb
    for annotation in iterator:
  File "/cluster/home/usr/.conda/envs/mgkit/lib/python3.7/site-packages/mgkit/io/blast.py", line 200, in parse_uniprot_blast
    value_funcs=value_funcs):
  File "/cluster/home/usr/.conda/envs/mgkit/lib/python3.7/site-packages/mgkit/io/blast.py", line 136, in parse_blast_tab
    for index, func in zip(ret_col, value_funcs)
  File "/cluster/home/usr/.conda/envs/mgkit/lib/python3.7/site-packages/mgkit/io/blast.py", line 136, in <genexpr>
    for index, func in zip(ret_col, value_funcs)
  File "/cluster/home/usr/.conda/envs/mgkit/lib/python3.7/site-packages/mgkit/workflow/blast2gff.py", line 192, in name_func
    return x.split(header_sep)[gene_index]
IndexError: list index out of range

Would anyone have encounter the same issue?

blastn blast blast2gff gff python • 281 views
ADD COMMENTlink modified 10 days ago by onkarnath890 • written 5 months ago by Begonia_pavonina20

Thanks for your answer zorbax! I am using the 6 output format, and the default outputs are as following:

ACmerged_contig_1002    Scaffolds_2465_pilon    91.255  263 21  2   551 811 43629   43367   1.05e-96    357

Following your advice, I have tried the two following formats:

ACmerged_contig_1002|Scaffolds_2465_pilon   91.255  263 21  2   551 811 43629   43367   1.05e-96    357

ACmerged_contig_1002|Scaffolds_2465_pilon|91.255|263|21|2|551|811|43629|43367|1.05e-96|357

And it gives the same error message. Is it the format you were showing to me, or did I miss someting?

ADD REPLYlink modified 5 months ago • written 5 months ago by Begonia_pavonina20
1

Issue is likely with Scaffolds_2465_pilon field. As shown below NCBI's fasta headers have identifiers separated by a | character e.g. sp|O14830|PPE2_HUMAN. @zorbax is referring to just second field (hit) in blast output.

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLYlink modified 5 months ago • written 5 months ago by GenoMax95k

It does work! Thanks genomax. Apologizes for the incorrect use of the Answer button.

ADD REPLYlink written 5 months ago by Begonia_pavonina20

Great! You can accept @zorbax's answer (green check mark) to provide closure to the thread.

ADD REPLYlink written 5 months ago by GenoMax95k

To note: there is actually an option to input a different separator for the header. https://mgkit.readthedocs.io/en/0.4.1/scripts/blast2gff.html#cmdoption-blast2gff-blastdb-s

But trying this with tab is not working, for some reasons...

blast2gff blastdb --header-sep '\t' output_blast.out output_blast.gff
ADD REPLYlink written 5 months ago by Begonia_pavonina20

I am having same issue. I even removed lines with '|' in names, still the error is same

a few lines from my blast file:

P05522.1        tig00054011_1   55.195  154     30      2       26      141     4514526 4514984 5.85e-41        161
P05522.1        tig00000001_1   100.000 111     0       0       384     494     3401148 3400816 8.57e-78        204
P05522.1        tig00000001_1   100.000 56      0       0       330     385     3401396 3401229 8.57e-78        110

Please help!!

ADD REPLYlink written 10 days ago by onkarnath890
2
gravatar for zorbax
5 months ago by
zorbax230
Mexico
zorbax230 wrote:

check the headers of your sequences, subject or query, the default separator should be |. I tried with a blast output format 6 with the following format and it works:

c0_seq1  sp|O14830|PPE2_HUMAN 68.421 76 24 0 229 2 437 512 1.86e-34 126
ADD COMMENTlink written 5 months ago by zorbax230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1856 users visited in the last hour
_