Part of a paper I am writing involves comparing different human genome assemblies. I would like to have some kind of citation for the assemblies hg18, hg19, and hg38. It seems like many other papers do not cite them, for example http://nar.oxfordjournals.org/content/early/2010/10/18/nar.gkq963.full. However, I noticed some conflicting info on various database entries for the genomes and would like to know which information to use. For example, the release dates differ for hg19 on the NCBI assembly database verses the Genome Reference Consortium page February 27, 2009 vs March 3, 2009.
Citing versions of any particular bioinformatics/genomics resources can get tricky because there is often no formal publication for every release of a given dataset. Further complicating the situation is the fact that you will often come across different dates (and even names) for the same resource. E.g. the latest cow genome assembly generated by the University of Maryland is known as 'UMD 3.1.1'. However, the UCSC genome browser uses their own internal IDs for all cow genome assemblies and refers to this as 'bosTau8'. Someone new to the field might see the UCSC version and not know about the original UMD name.
Sometimes you can use dates of files on FTP sites to approximately date sequence files, but these can sometimes change (sometimes files accidentally get removed and replaced from backups, which can change their date).
The key thing to aim for is to provide suitable information so that someone can reproduce your work. In my mind, this requires 2-3 pieces of information:
The name or release number of the dataset you are downloading (provide alternate names when known)
The specific URL for the website or FTP site that you used to download the data