Question

Fasta Output Python Parsing

0

Entering edit mode

13.2 years ago

Richard ▴ 600

Hi all, I was about to write this myself, but I thought I would check with the experts first.

Does anyone know of a python parser for FASTA output? ie. when I use the FASTA35.exe aligner there are a number of output format options (via -m). Is there existing code to slurp any of the output formats without having to write my own?

fasta python parsing • 3.0k views

ADD COMMENT • link updated 13.2 years ago by jingtao09 ▴ 110 • written 13.2 years ago by Richard ▴ 600

score 3 · Answer 1 · 2012-05-07

3

Entering edit mode

13.2 years ago

Neilfws 49k

The BioPython Bio.AlignIO module will read FASTA output generated using the -m 10 option.

ADD COMMENT • link 13.2 years ago by Neilfws 49k

0

Entering edit mode

Great. Do you know of any online examples of how to pull out the alignment scores?

ADD REPLY • link 13.2 years ago by Richard ▴ 600

score 0 · Answer 2 · 2012-05-09

For the fasta parser I wrote a code in python took the idea from Pierre Lindenbaum  in his post  
http://www.biostars.org/post/show/19426/counting-ns-within-fasta/#19439

def fastaio(fh):
        """
        it can take any file handler as input
       eg. fh=open(filename) , sys.stdin  , import gzip; fh=gzip.open(filename)
        """
        buff=[]
        header=[]
        while 1:
                c=fh.read(1)
                if not c or c==">":
                        if len(header)!=0:
                                yield (''.join(header) ,  ''.join(buff))
                        if not c: break
                        header[:]=[] ; buff[:]=[]
                        while 1:
                                hc=fh.read(1)
                                if hc=='\n':break
                                header.append(hc)
                else:
                        buff.append(c)

fh=open("myfile.fasta") for name, seq in fastaio(fh): print ">"+name print seq.replace("\n\s\r","")