As the title says. I am trying to analyze synthetic metagenomic data generated from the genomes in an earlier release of RefSeq (2017.07). For the classification of the data, it would be important for me to have all the representative genome assemblies from this release. My problem is that the assembly_summary file I can get from RefSeq only contains the current representatives (among other assemblies ofc) and the assembly_summary_historical has no information if any of the previous assemblies were representatives. Similarly, the .catalog file for the release doesn't seem to contain this information. So, is there a way to get a "snapshot" of this earlier release of RefSeq?
Thanks in advance!
Possibly. You will need to know the RefSeq release version. Check in this directory: https://ftp.ncbi.nlm.nih.gov/refseq/release/release-catalog/archive/
Thanks! I've already tried looking through these files, I need release 83. I am trying to find this information in the .catalog file of the release but it doesn't seem to contain it
Catalog file lists the genomes included. There seem to be 71365 unique items in field 2.
Yes, I can tell that, but as far as I see it doesn't contain any other information about the genomes, such as if they were representative in that version.
I am moving my answer to a comment. At this point you should email NCBI help desk and ask them this question. If you get an answer then please post it back in this thread.