This code reads your FASTA file, stores the entries in a dictionary and writes them back to a new FASTA file in a sorted order. It assumes that your FASTA file is formatted properly with each sequence header preceded by a ">".
from Bio import SeqIO
# read the FASTA file
sequences = SeqIO.to_dict(SeqIO.parse("input.fasta", "fasta"))
# sort sequences by keys (headers)
sorted_sequences = dict(sorted(sequences.items()))
# write the sorted sequences to a new FASTA file
with open("sorted.fasta", "w") as output_handle:
SeqIO.write(sorted_sequences.values(), output_handle, "fasta")
You also need to replace "input.fasta" with the name of your FASTA file. The code will then write the sorted sequences to a file named "sorted.fasta". You can change this to any filename you prefer and s ince the script uses biopython you need to install it (if you have not already), simply:
pip install biopython.
Overall the code sorts your sequences alphabetically by the header. If your headers are "gene1", "gene2", etc., they will be sorted in numerical order as well because the numbers come after the same prefix "gene". If you have headers like "gene1", "gene11", "gene2", these will not be sorted numerically because "gene11" comes alphabetically before "gene2". In this case, you'd need to adjust your headers to have consistent formatting like "gene01", "gene02", "gene11", or add additional code to sort numerically.