I know that this question is already 6 years old, but I hope that my answer might be useful to others anyway.
I implemented a standardized way to automate the genome retrieval process in R (see biomartr package).
To retrieve the human reference genome from several database sources one can simply type:
# download human reference genome from NCBI RefSeq
biomartr::getGenome(db = "refseq", organism = "Homo sapiens")
# download human reference genome from NCBI Genbank
biomartr::getGenome(db = "genbank", organism = "Homo sapiens")
# download human reference genome from ENSEMBL
biomartr::getGenome(db = "ensembl", organism = "Homo sapiens")
This way, users can use the same command to retrieve reference genomes from different databases. Each database has its own custom gene identifier and thus, it should always be clear which reference genome has been used to perform subsequent analyses.
For more detailed information please consult the Genomic Sequence Retrieval vignette.
The getGenome() function will then generate a log file that stores the following information:
File Name: Homo_sapiens_genomic_refseq.fna.gz
Organism Name: Homo_sapiens
Database: NCBI refseq
Download_Date: Sat Oct 22 12:41:07 2016
genome assembly_accession: GCF_000001405.35
submitter: Genome Reference Consortium
Thus, you will always know with which reference genome and with which genome version you are working.
I hope that this will help to improve the reproducibility of many studies.
Alternatively, the biomartr package also provides functions for retrieving corresponding coding sequence - getCDS(), protein sequence - getProteome(), and annotation files - getGFF().