What'S The Difference Between Two Versions Of The Same Assembly ?
2
7
Entering edit mode
13.4 years ago

The Homo sapiens Genome (Build 37.2) has just been published by the NCBI : http://www.ncbi.nlm.nih.gov/mapview/stats/BuildStats.cgi?taxid=9606&build=37&ver=2

Do the Fasta sequences of the human genome have changed since the version 37.1 (e.g: some large contigs of NNNNN would have been solved ? ) or is it just a matter of annotations ?

genome sequence ncbi • 8.8k views
ADD COMMENT
10
Entering edit mode
13.4 years ago
Bio_X2Y ★ 4.4k

Short answer

Afaik, the sequences are not changed for the primary assembly, so if you don't go looking for the new sequences, you won't find them. However, some new corrections and new sequence are available if you want them. The annotations have been updated.

Long answer

NCBI 37.2, like Ensembl 60, is based on GRCh37.p2.

GRCh37.p2 is, as the name implies, just a patch release for GRCh37. It contains two kinds of patches, which should be seen as temporary updates until they are fully incorporated into the next major release of the genome.

  • A fix patch represents corrections to an existing sequence.
  • A novel patch represents novel sequence (perhaps filling some of the N's of runs).

As I understand it, these are not intended to replace the original sequences while the main release, GRCh37, is still in effect. They can be seen as a sneak preview of the next major release for those who are interested. This means that the GRCh37 primary assembly is the same between GRCh37, GRCh37.p1 and GRCh37.p2, and the patch sequences exist separately.

If you download the primary assembly sequence for Chromosome 5, say, from the GRC, you will get the original sequence, which doesn't include the updates. To get the updated sequences, you would need to download a separate patch file. Small annotation files are also provided that explain where the patch sequence "fits" in the original sequence.

Ensembl follows the pattern of keeping the original sequences separate from the patches - you would download the original sequences separately from the patch sequences.

I'm less sure of this, but I think NCBI also keeps the sequences separate. NCBI 37.2 contains a 'GRCh37.p2-Primary Assembly', and the patch sequences seem to be represented as a separate assembly - "GRCh37.p2-PATCHES".

Note also that GRCh37.p2 contains the mitochondrial sequence - previous versions of the assembly did not contain this.

ADD COMMENT
2
Entering edit mode
ADD REPLY
2
Entering edit mode
13.4 years ago
Rm 8.3k

As far my knowledge goes sequence won't change between versions (.1 or .2) of the same build (here 37). Only annotations will be updated.

ADD COMMENT
0
Entering edit mode

my answer applies to ensembl but i imagine they follow similar systems - yes it is a change in the annotation. sometimes gene annoations can be added and deleted but the sequence is the same

ADD REPLY

Login before adding your answer.

Traffic: 2487 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6