Question: Alignio Gives 'Assertionerror' When Reading Emboss Alignment Files
0
gravatar for a1ultima
4 months ago by
a1ultima230
London
a1ultima230 wrote:

I have been stuck on a problem for three days... searched everywhere, posted on StackOverflow, still waiting for EMBL to respond to emails...

After aligning sequences with EMBOSSwin needle() (pairwise global alignments) I get alignment files in pair, with a .needle file extension. I want to use Biopython to read these alignments for later analysis.

I use AlignIO.read(open('alignment.needle'),'emboss') following the instructions in Biopython's wiki but I keep getting an AssertionError.

My code:

>>> from Bio import AlignIO
>>> alignment = AlignIO.read(open("data/all/out/pair1_alignment.needle"), "emboss")

My error:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "C:\Python27\lib\Bio\AlignIO\__init__.py", line 423, in read
    first = next(iterator)
  File "C:\Python27\lib\Bio\AlignIO\__init__.py", line 370, in parse
    for a in i:
  File "C:\Python27\lib\Bio\AlignIO\EmbossIO.py", line 150, in __next__
    assert seq.replace("-", "") != ""
AssertionError

Example Alignment File:

Download the alignment file here

Picture of alignment file

Versions:

  • Windows 7
  • Python version 2.7.3
  • Biopython version 1.63
  • EMBOSS version 2.10.0-0.8

Clues:

I suspect this may be related to a warning message I kept getting when actually making the alignments, which was outputted by EMBOSS needle() function:

Warning: Sequence character string not found in ajSeqCvtKS

Thank you for reading.

Andy

ADD COMMENTlink modified 4 months ago by Peter3.8k • written 4 months ago by a1ultima230
1

You could link to the (unanswered) question on StackOverflow.

ADD REPLYlink written 4 months ago by Peter3.8k

Thank you @Peter and @Whetting. Peter you are right I think, here is the stackoverflow link: http://bit.ly/IgqclE . I have learnt the format is apparently "srspair" but I have tried with ALL possible alignment formats outputtable by needle(), i.e. the following alignment formats: "pair, srspair, markx0, markx1, markx2, markx3, markx10, and score".

ADD REPLYlink written 4 months ago by a1ultima230

Does the needle output look correct? If it does, try alignment = AlignIO.read(open("data/all/out/pair1_alignment.fasta"), "fasta")

ADD REPLYlink written 4 months ago by Whetting1.1k

Thank you for the suggestion, actually it led to a new error! "No records found in handle", I will incorporate the suggestion into the main question in case others may be able to debug the new error

ADD REPLYlink modified 4 months ago • written 4 months ago by a1ultima230
1

The sample output shown is NOT in the FASTA format, so don't tell AlignIO that it is.

ADD REPLYlink written 4 months ago by Peter3.8k
1
gravatar for Peter
4 months ago by
Peter3.8k
Scotland, UK
Peter3.8k wrote:

This looks like a bug in the Biopython EMBOSS alignment parser. Which version of Biopython and EMBOSS are you using? Could you share the example output file, e.g. as a gist on GitHub or using pastebin, so that we can try to fix it? You could also have report this via the Biopython mailing list or issue tracker on the Biopython GitHub page, or the (old) RedMine bug tracker - both linked to from http://biopython.org


Update: This appears to be down to a subtle change in the EMBOSS output. You have an extremely old version, EMBOSS version 2.10.0 (February 2005), and your output file has lines like this:

gag             1288 --------------------------------------------------   1287

Using a newer version of EMBOSS (e.g. 6.3.0), gives lines like this:

gag             1287 --------------------------------------------------   1287

The Biopython parser is expecting the latter for alignment sections with no letters (e.g. when one sequence is much longer than the other), where the start and end coordinates agree. Please update your copy of EMBOSS, and then the parser should be happy. The current EMBOSS release is version 6.5.0.

Edit: I reposted this answer on your duplicate question on StackOverflow: http://stackoverflow.com/questions/20159230/alignio-gives-assertionerror-when-reading-emboss-alignment-files/

ADD COMMENTlink modified 4 months ago • written 4 months ago by Peter3.8k

Again thank you @Peter, here are what you ask for:

Output file: https://www.dropbox.com/s/clxmrsr750xern3/pair1.needle

(note: I have since changed the file extension from .fasta to .needle)

Biopython version: 1.63 (for python2.7, win32)

page: http://biopython.org/wiki/Download#1.63_.28beta.29

download: http://biopython.org/DIST/biopython-1.63b.win32-py2.7.exe

EMBOSS version: 2.10.0-0.8

page: http://www.interactive-biosoftware.com/software/resources/embosswin

download: http://www.interactive-biosoftware.com/embosswin/embosswin-0.8-setup.exe

StackOverflow Post here

ADD REPLYlink modified 4 months ago • written 4 months ago by a1ultima230
1

Wow that's a really old version of EMBOSS (February 2005), here's the EMBOSS 6.5.0 release: ftp://emboss.open-bio.org/pub/EMBOSS/old/6.5.0/windows/

ADD REPLYlink written 4 months ago by Peter3.8k

That fixed it! As posted on SO thank you so much. It did not just save me here, but now I am primed to seek "version related" causes for errors.

ADD REPLYlink modified 4 months ago • written 4 months ago by a1ultima230
1

Great - thanks for confirming this solved it, and you're right - checking versions is a good habit to get into :)

ADD REPLYlink written 4 months ago by Peter3.8k
Please log in to add an answer.

Help
Access
  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 156 posts viewed in the last hour