I am currently trying to perform data analysis on a data set containing over 25,000 sequences and wish to align them, is there a way I can do this efficiently that won't cause an alignment program to crash because of the size of the data?
However if your sequences are DNA or RNA, I would suggest you look at MAFFT or Kalign instead. Since the method used in Clustal Omega, does not perform as well with nucleotide alignments (this is being worked on).
If your sequences are short and very similar then other multiple sequence alignment programs, such as MUSCLE and T-Coffee, might work, although the alignment may still require a lot of memory to complete successfully.
If you wish to align those proteins to a reference assembly you could use the exonerate (http://www.ebi.ac.uk/~guy/exonerate/) protein2genome model which models introns. I used this when I wanted to align proteins from the TAIR10 database to our reference genome. You would also probably want to split the file into considerably smaller chunks so that many faster individual alignments can be carried out before the results are merged - this way the alignment as a whole will be much quicker.
Edit: I assumed the proteins were being aligned to a reference sequence rather than to each other (in which case this solution would not be appropriate).