I have been wondering at the correct approach in Python, maybe using Biopython, of parsing a fasta file without having to place it in memory (eg: NOT having to read it to a list, dictionary or fasta class) before using it.
The desired result would behave like a generator, as in the pseudo-code example below:
fasta_sequences = fasta_generator(input_file) # The function I miss with open(output_file) as out_file: for fasta in fasta_sequences: name, sequence = fasta new_sequence = some_function(sequence) write_fasta(out_file) # Function defined elsewhere
Important aspects are:
- Read sequences one at a time
- Does not put all the sequences into memory
- The approach is safe and well tested
Thanks for your suggestions!