7.6 years ago by
It is important to differentiate the production and provision of genomes. Ensembl distributes INSDC genomes and then attempts to annotate; we do not produce genomes. INSDC do store a lot of this information so we try to provide links back to INSDC via assembly accessions, location in the core meta table, under the key assembly.accession. This is not filled in for all species though coverage is good. Once you have that you can use NCBI or ENA for some of your required information e.g. Rat's Rnor5.0 accession is GCA000001895.3 giving you two links:
From here we can follow the WGS project ID AABR06 (AABR00000000.6) you can get some more assembly information in the COMMENT section:
Assembly Method :: Newbler v. 2.0.0-PreRelease-01162009
paired with Phrap v. 0. 990329 for Sanger
reads; CLC bio for Solid reads
Assembly Name :: Rnor_5.0
Genome Coverage :: 3x BAC; 6x WGS ABI Sanger reads
Sequencing Technology :: Sanger; SOLiD
There's also strain information at the end as a feature:
If the information is not available for these species then you will have to go back to the data producer e.g. Baylor or the genome paper.
On a side note next-gen based genomes metrics like depth doesn't really mean that much and other metrics like N50 can be mis-leading. The Assemblathon (http://assemblathon.org/) is doing a good job at addressing this issue and perhaps for next-gen genomes you may want to switch to using their recommended metrics.