Question: What is the difference between GRCh37 and hs37? And hg19?
4
gravatar for juanfdelahoz
14 months ago by
juanfdelahoz40
juanfdelahoz40 wrote:

Hi! I've been struggling with the naming conventions of human reference genomes...

I know hg19 and GRCh37 are the same, but different names for each chromosome.

I know b37 is only the 25 longest sequences from GRCh37 (1-22,X,Y,MT)

I know we are now on the GRCh38 (or hg38) and we should be using that one.

However, for some reason, researchers in human genomes still use hg19...

Now, I found a reference called hs37 and I don't understand where it comes from. And there's not a single place where all this mess is explained. And all Heng Li says is: "If you map reads to GRCh37 or hg19, use hs37-1kg" : |

Other organisms have smaller communities and their genomes are better standardized, but humans... omg!

Thanks!

ADD COMMENTlink modified 13 months ago • written 14 months ago by juanfdelahoz40
1

juanfdelahoz not looking for grammar correction, but can you change "hg37" to "hs37" in title and tags?

ADD REPLYlink written 14 months ago by cpad011212k

The title originally has hs37 that I changed to hg37. I've changed it back now.

ADD REPLYlink written 14 months ago by RamRS24k
1

This is also an insightful piece from Heng Li:

http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use

ADD REPLYlink written 13 months ago by colindaven1.7k
10
gravatar for genomax
14 months ago by
genomax71k
United States
genomax71k wrote:

While some of this is confusing for someone starting out new there is order to the seemingly arcane nomenclature.

GRCh38/hg38 is the current release of the human genome. You should indeed be using this since it has been around for ~5 years at this point. You can find the data for it at NCBI's GRCh38 site.

GRCh37/hg37 is synonymous with hg19. You can find the information about this release at NCBI's GRCh37 site.

hs37 is a special genome reference prepared for 1000 genomes project by this method. You can find that data here.

Ultimately GENCODE is the organization project responsible for managing human/mouse genome data. They provide the authoritative genome data that is used by everyone including NCBI/UCSC/Ensembl.

ADD COMMENTlink modified 13 months ago • written 14 months ago by genomax71k

I recall there was an extensive discussion on differences between GRCh37 and hg19 somewhere. Pierre was involved, I think.

ADD REPLYlink written 14 months ago by RamRS24k

Ultimately GENCODE is the organization responsible for managing human/mouse genome data. They provide the authoritative genome data that is used by everyone including NCBI/UCSC/Ensembl.

I believe you mean the Genome Reference Consortium manages the human and mouse genome data. GENCODE is an annotation group at EBI and is not part of the GRC, although the EBI is a member.

ADD REPLYlink written 13 months ago by tdmurphy160

Project is a better designation for GENCODE. Correction made above. GRC releases genome builds while annotation is produced by GENCODE project members.

ADD REPLYlink modified 13 months ago • written 13 months ago by genomax71k

Why is hg17, hg18, hg19 followed by hg38 and not "hg20" as one would expect?

ADD REPLYlink written 5 months ago by BioinformaticsLad140
1

hg19 is equivalent to GRCh37. I recall reading somewhere that they decided to unify the version numbers for hg and GRCh conventions, and so now it is hg38/GRCh38.

ADD REPLYlink written 5 months ago by RamRS24k

They should have gone one step further and unified the references as well!

ADD REPLYlink written 5 months ago by BioinformaticsLad140
1

There is only one reference sequence. There are annotations that come from different sources.

With graph based assemblies coming in near future reference sequences will gain a new complexity.

ADD REPLYlink modified 5 months ago • written 5 months ago by genomax71k
2
gravatar for nikos.psonis
13 months ago by
nikos.psonis20
nikos.psonis20 wrote:

This is what I have found so far. Please correct me if I am wrong.

GRCh37 w/o patches includes the primary assembly (22 autosomal, X. Y, and non-chromosomal supecontigs) and alternate scaffolds, but not a reference mitogenome. Non-chromosomal supercontigs are the unlocalized and unplaced scaffolds.

The rCRS reference mitogenome in GRCh37 was included only after patch 2 (GRCh37.p2). This patch also included some fix and novel patches.

UCSC hg19 = GRCh37 w/o patches + African Yoruba mitogenome (not rCRS). Also UCSC hg19 has: Different naming conventions (e.g. chromosome X: chrX in UCSC vs. X in GRC). Different coordinate system (Start numbering a chromosome from 1 in UCSC vs. 0 in GRC).

Note also that Ion torrent uses a hg19 with replaced mitogenome (rCRS instead of Yoruba Sequence).

The b37 is hs37-1kg and does not include only the "25 longest sequences from GRCh37 (1-22,X,Y,MT)" but it is a 1000 Genome convention that includes: -The 24 "relatively complete" chromosomal sequences (named "1" to "22", "X" and "Y") downloaded individually from ENSEMBL. -The GRCh37.p2 (rCRS) mitochondrial sequence (named "MT") downloaded from MITOMAP or NCBI. -The unlocalized sequences, which were named after their accession numbers, such as "GL000191.1", "GL000194.1", etc. -The unplaced sequences, which were named after their accession numbers, such as "GL000211.1", "GL000241.1", etc. Only the alternate loci were not included in the b37 dataset.

hs37d5 (known also as b37 + decoy) was released by The 1000 Genomes Project (Phase II), which introduced additional sequence (BAC/fosmid clones, HuRef contigs, Epstein-Barr Virus genome) to the b37 reference to help reduce false positives for mapping. Note that this one uses the primary assembly of GRCh37.p4 (not the one of GRCh37 w/o patches).

As for hs37 (without -1kg) I think it is generated only by bwakit in BWA and according to their manual it corresponds to b37+EBV (Epstein-Barr Virus genome). EBV genome is also found in hs37d5 and GRCh38 and it is included because it is used in molecular biology for transformations and because it naturally infects B cells in ~90% of the world population.

There is no hg37.

ADD COMMENTlink modified 13 months ago • written 13 months ago by nikos.psonis20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1871 users visited in the last hour