I'm trying to concatenate hundreds of .fasta files into a single, large fasta file containing all of the sequences. I haven't found a specific method to accomplish this in the forums. I did come across this code from http://zientzilaria.heroku.com/blog/2007/10/29/merging-single-or-multiple-sequence-fasta-files, which I have adapted a bit.
Fasta.py contains the following code:
class fasta: def __init__(self, name, sequence): self.name = name self.sequence = sequence def read_fasta(file): items =  index = 0 for line in file: if line.startswith(">"): if index >= 1: items.append(aninstance) index+=1 name = line[:-1] seq = '' aninstance = fasta(name, seq) else: seq += line[:-1] aninstance = fasta(name, seq) items.append(aninstance) return items
And here is the adapted script to concatenate .fasta files:
import sys import glob import fasta #obtain directory containing single fasta files for query filepattern = input('Filename pattern to match: ') #obtain output directory outfile = input('Filename of output file: ') #create new output file output = open(outfile, 'w') #initialize lists names =  seqs =  #glob.glob returns a list of files that match the pattern for file in glob.glob(filepattern): print ("file: " + file) #we read the contents and an instance of the class is returned contents = fasta.read_fasta(open(file).readlines()) #a file can contain more than one sequence so we read them in a loop for item in contents: names.appenditem.name) seqs.append(item.sequence) #we print the output for i in range(len(names)): output.write(names[i] + '\n' + seqs[i] + '\n\n') output.close() print("done")
It is able to read the fasta files but the newly created output file contains no sequences. The error I receive is due to the fasta.py, which is beyond my capability to mess with:
Traceback (most recent call last): File "C:\Python32\myfiles\test\3\Fasta_Concatenate.py", line 28, in <module> contents = fasta.read_fasta(open(file).readlines()) File "C:\Python32\lib\fasta.py", line 18, in read_fasta seq += line[:-1] UnboundLocalError: local variable 'seq' referenced before assignment
Any suggestions? Thanks!