Question: What'S The Difference Between Two Versions Of The Same Assembly ?
gravatar for Pierre Lindenbaum
9.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

The Homo sapiens Genome (Build 37.2) has just been published by the NCBI :

Do the Fasta sequences of the human genome have changed since the version 37.1 (e.g: some large contigs of NNNNN would have been solved ? ) or is it just a matter of annotations ?

genome ncbi sequence • 6.6k views
ADD COMMENTlink modified 3.0 years ago by Biostar ♦♦ 20 • written 9.0 years ago by Pierre Lindenbaum124k
gravatar for Bio_X2Y
9.0 years ago by
Bio_X2Y3.7k wrote:

Short answer

Afaik, the sequences are not changed for the primary assembly, so if you don't go looking for the new sequences, you won't find them. However, some new corrections and new sequence are available if you want them. The annotations have been updated.

Long answer

NCBI 37.2, like Ensembl 60, is based on GRCh37.p2.

GRCh37.p2 is, as the name implies, just a patch release for GRCh37. It contains two kinds of patches, which should be seen as temporary updates until they are fully incorporated into the next major release of the genome.

  • A fix patch represents corrections to an existing sequence.
  • A novel patch represents novel sequence (perhaps filling some of the N's of runs).

As I understand it, these are not intended to replace the original sequences while the main release, GRCh37, is still in effect. They can be seen as a sneak preview of the next major release for those who are interested. This means that the GRCh37 primary assembly is the same between GRCh37, GRCh37.p1 and GRCh37.p2, and the patch sequences exist separately.

If you download the primary assembly sequence for Chromosome 5, say, from the GRC, you will get the original sequence, which doesn't include the updates. To get the updated sequences, you would need to download a separate patch file. Small annotation files are also provided that explain where the patch sequence "fits" in the original sequence.

Ensembl follows the pattern of keeping the original sequences separate from the patches - you would download the original sequences separately from the patch sequences.

I'm less sure of this, but I think NCBI also keeps the sequences separate. NCBI 37.2 contains a 'GRCh37.p2-Primary Assembly', and the patch sequences seem to be represented as a separate assembly - "GRCh37.p2-PATCHES".

Note also that GRCh37.p2 contains the mitochondrial sequence - previous versions of the assembly did not contain this.

ADD COMMENTlink modified 9.0 years ago • written 9.0 years ago by Bio_X2Y3.7k

I'll let Dan explain it.

ADD REPLYlink written 5.8 years ago by Emily_Ensembl19k
gravatar for Rm
9.0 years ago by
Danville, PA
Rm7.9k wrote:

As far my knowledge goes sequence won't change between versions (.1 or .2) of the same build (here 37). Only annotations will be updated.

ADD COMMENTlink written 9.0 years ago by Rm7.9k

my answer applies to ensembl but i imagine they follow similar systems - yes it is a change in the annotation. sometimes gene annoations can be added and deleted but the sequence is the same

ADD REPLYlink written 9.0 years ago by Andrea_Bio2.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1142 users visited in the last hour