The NCBI FAQ states:
Only FTP files for the "latest" version of an assembly are updated when annotation is updated, new file formats are added or improvements to existing formats are released.
It also states
Any changes to the sequences included in a particular assembly accession result in an increment of the assembly version, which means that an assembly accession.version (e.g. GCF_000001405.28) represents a fixed set of sequences.
This is the point I am confused about: When a file is "updated," i.e. "updated when annotation is updated," is it treated as a "changed," file whose version number will then be incremented? I feel NCBI's wording is ambigous. Is there a difference between a changed vs. updated file?
To clarify: when files are updated, is the version number always incremented? Or are files sometimes updated without incrementing the version number? That is, do any and all changes result in a change of the filename, i.e. incrementing the version number.
I'm wondering if I can detect updated files based solely on the file name due to the version number being incremented.
The reason I am wondering: If, files are updated without changing the filename via incrementing the version number, then rsync is the way to go.
If not, then the task is much simpler and quickly accomplished since I would just need to worry about getting newly uploaded files based on their filename with wget.
You may want to stick with genomes in RefSeq section which you can find here (bacteria directory) : ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/
Here is an explanation of how accession number and version numbers are handled by NCBI.