Question: Displaying Information Regarding The Quality Of The Ensembl Genomes
1
gravatar for Anima Mundi
7.7 years ago by
Anima Mundi2.7k
Italy
Anima Mundi2.7k wrote:

Hello,

I would like you to point me out how to display systematically some crucial information regarding the quality of the genomes present in Ensembl. In particular, for every genome, if possible I would like to know depth, coverage and strain(s) used. It seems like the description page (currently accessible from the Ensembl homepage) of the genomes sometimes lacks this information, even for broadly studied species as the rat. Some information I found browsing the site, and in general the web, but I hope there is a systematic summary.

ADD COMMENTlink written 7.7 years ago by Anima Mundi2.7k

I would be interested in that too. I've been looking for it in the past without any success.

ADD REPLYlink written 7.7 years ago by Biojl1.7k
1
gravatar for Andy Yates
7.6 years ago by
Andy Yates110
Cambridge
Andy Yates110 wrote:

Hi,

It is important to differentiate the production and provision of genomes. Ensembl distributes INSDC genomes and then attempts to annotate; we do not produce genomes. INSDC do store a lot of this information so we try to provide links back to INSDC via assembly accessions, location in the core meta table, under the key assembly.accession. This is not filled in for all species though coverage is good. Once you have that you can use NCBI or ENA for some of your required information e.g. Rat's Rnor5.0 accession is GCA000001895.3 giving you two links:

http://www.ncbi.nlm.nih.gov/assembly/GCA_000001895.3

http://www.ebi.ac.uk/ena/data/view/GCA_000001895

From here we can follow the WGS project ID AABR06 (AABR00000000.6) you can get some more assembly information in the COMMENT section:

##Genome-Assembly-Data-START##
Assembly Method       :: Newbler v. 2.0.0-PreRelease-01162009
                         paired with Phrap v. 0. 990329 for Sanger
                         reads; CLC bio for Solid reads
Assembly Name         :: Rnor_5.0
Genome Coverage       :: 3x BAC; 6x WGS ABI Sanger reads
Sequencing Technology :: Sanger; SOLiD
##Genome-Assembly-Data-END##

There's also strain information at the end as a feature:

FEATURES             Location/Qualifiers
     source          1..112651
                     /organism="Rattus norvegicus"
                     /mol_type="genomic DNA"
                     /strain="BN/SsNHsdMCW"
                     /db_xref="taxon:10116"

If the information is not available for these species then you will have to go back to the data producer e.g. Baylor or the genome paper.

On a side note next-gen based genomes metrics like depth doesn't really mean that much and other metrics like N50 can be mis-leading. The Assemblathon (http://assemblathon.org/) is doing a good job at addressing this issue and perhaps for next-gen genomes you may want to switch to using their recommended metrics.

ADD COMMENTlink written 7.6 years ago by Andy Yates110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1807 users visited in the last hour