Question: GRCh37/38(NCBI) vs hg19/hg38(UCSC)
37
gravatar for pwg46
6.1 years ago by
pwg46430
United States
pwg46430 wrote:

Are there any major differences between the GRCh38 (NCBI) and hg38(UCSC) databases, aside from the fact that GRCh38 uses a 1-based coordinate system, while UCSC uses a 0-based coordinate system? Are there any pros/cons in using one vs the other? And, I am guessing that any identifier conversion software (e.g, BioMart) should choose one database over the other? Also, where does Ensembl come into play? Is the Ensembl database just a subset of the GRCh38 (NCBI) database? Any clarification would be greatly appreciated.

ucsc hg38 grch38 ncbi • 70k views
ADD COMMENTlink modified 6.0 years ago by Denise CS5.1k • written 6.1 years ago by pwg46430
57
gravatar for Devon Ryan
6.1 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

GRCh37/hg19 and GRCh38 are genome builds rather than annotations, which describe where features are in a given genome build. The actual sequences you'll get from NCBI/UCSC/Ensembl will be identical, but their annotations will be different and (importantly) updated at different frequencies. NCBI's annotation is the "refseq" dataset (the "refGene" track in UCSC), which is essentially a subset of the UCSC and Ensembl annotations. UCSC's annotations are kind of a mess. You'll find genes with the same ID on multiple strand and multiple chromosomes, which makes them a bit useless. Ensembl's annotations typically contain more features than UCSC (so a bit more noise), but they're otherwise much better put together (e.g., you'll never find a gene ID on different strand or different chromosomes) and their IDs are typically easier to map to other things (e.g., gene names, GO and pathway memberships). Ensembl also updates its annotation fairly often and versions everything nicely, so it's quite convenient to report what version you used in a paper (reproducibility is always a good thing). Given the choice, use the Ensembl annotation.

BTW, don't forget that the various sources can use different names for chromosomes (e.g., chr1 in UCSC is just 1 in Ensembl), so don't mix and match them.

ADD COMMENTlink written 6.1 years ago by Devon Ryan97k
1

I see. Thank you for your answer. So, right now I am using the Ensembl and Uniprot databases. Would there be any reason to include the UCSC database if I am working with an identifier conversion tool? E.g, say I am trying to map Ensembl Transcript (ENST) identifiers to Uniprot. Would I get any different mappings converting directly from ENST->Uniprot (both Ensembl and Uniprot dbs have data files which do so) than converting from ENST->UCSC->Uniprot? 

ADD REPLYlink written 6.1 years ago by pwg46430

You might get more ambiguous mappings going via UCSC (or not, it's hard to say).

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by Devon Ryan97k

Okay. So, in general, do you think it would be wise to stick only with the Ensembl database and not mix the two (Ensembl and UCSC) with respect to an identifier conversion software?

ADD REPLYlink written 6.1 years ago by pwg46430
2

Yeah, you'll normally just have more headaches by mixing the two and Ensembl is typically one of the more supported IDs.

ADD REPLYlink written 6.1 years ago by Devon Ryan97k
3

No need to map IDs between resources yourself, EnsEMBL has good cross-references to many other databases including UniProt. You can access those either via BioMart or with the API.

ADD REPLYlink written 6.1 years ago by Jean-Karim Heriche23k
1

The UCSC Genome Browser just released an "NCBI RefSeq" track that is based entirely on coordinates and alignments provided by the RefSeq group. These new tracks should avoid the issue of genes mapping to multiple locations, etc. You can read about it more on our website: https://genome.ucsc.edu/goldenPath/newsarch.html#030317.

Matthew Speir UCSC Genome Bioinformatics Group

ADD REPLYlink written 3.6 years ago by Matthew_UCSC20
3
gravatar for Denise CS
6.0 years ago by
Denise CS5.1k
UK, Hinxton, EMBL-EBI
Denise CS5.1k wrote:

In addition to BioMart and the Perl API, you can also use the Ensembl REST API to map Ensembl IDs to cross reference entries and vice versa.

ADD COMMENTlink written 6.0 years ago by Denise CS5.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1395 users visited in the last hour