I was about to re-invent the wheel, again, when I thought that there are probably many among you who had already solved this problem.
I have a few huge fasta files (half a million sequences) and I would like to know the average length of the sequences for each of these files, one at a time.
So, let's call this a code golf. The goal is to make a very short code to accomplish the following:
Given a fasta file, return the average length of the sequences.
The correct answer will go to the answer with the most votes on friday around 16h in Quebec (Eastern Time).
You can use anything that can be run on a linux terminal (your favorite language, emboss, awk...), diversity will be appreciated :)
And the correct answer goes to the most voted question, as promised :) Thank you all for your great participation! I think I am not the only one to have appreciated the great diversity in the answers, as well as very interesting discussions and ideas.
Although the question is now closed, do not hesitate to vote for all the answers you found interesting, especially those at the bottom! They are not necessarily less interesting. They often have only arrived later :)