I already have a head start on this question (following this tutorial.) However that method is taking a _really_ long time since I have a list of ~0.5 Billion sequences to get. Additionally, some of my threads during sequence filtering are throwing errors and I'm afraid this method might not work.
So! I'm asking you if you have a better idea on how to get every bacterial protein sequence from NCBI. I don't think Edirect will work (I'll be blocked). One idea I had was if I could use esearch and efetch on a local copy of the all protein record (nr.fa). However Edirect doesn't support local queries out of the box (at least to my knowledge).
Any advice on how to wrangle Edirect to do local queries or any other ideas would be much appreciated.