I am currently developing a tool that compares the result of one BLAST search against different versions of a database to explain why some conclusion/finding comes up at one point but not at another.
For this I am obiviously in need of older BLAST results. Does anybody of you know a place (or have some) where I can find older BLAST search result xml files or older versions of useable nucleotide or protein databases?
I not sure whether one can reverse calculate the state of current NCBI databases but if that would be possible I would be grateful for any suggestions in that too.
One problem you may face is when sequences/genomes are updated. I'm not sure if you'll be able to find old versions of sequences.
What exactly are you interested in measuring? Are you interested in how the performance of BLAST over time has varied? Are you interested in how the evolution of the available sequence data over time has impacted BLAST? Or, are you interested in how the search space impacts the performance of BLAST?
If you're interested in how search space impacts BLAST performance, why not generate BLAST databases from random subsets of the current nr database (Using the alias tool like Istvan suggested). You could then add that include a known true positive(s) and measure performance. There are lots of strategies for this so it depends on what you're looking for.
Out of curiosity, what do you mean by some conclusions appearing at some times but not at others? Can you give an example? It is possible that BLAST may miss something, but that is very unlikely. Are you sure that it isn't just the fact that one result may have been added or removed from nr at that time? E.g. hit XXX didn't show up before 2009 because XXX wasn't in nr before then.
Another option might be to use old versions of Ensembl. I'm not sure that it is as well curated as nr, but old versions of Ensembl are available.
Another possibility would be to filter gene ids by their dates of submission or publication then use blastalias to create a subset of data that reflects the state of the blast database at a different point in time.