Afaik, the sequences are not changed for the primary assembly, so if you don't go looking for the new sequences, you won't find them. However, some new corrections and new sequence are available if you want them. The annotations have been updated.
NCBI 37.2, like Ensembl 60, is based on GRCh37.p2.
GRCh37.p2 is, as the name implies, just a patch release for GRCh37. It contains two kinds of patches, which should be seen as temporary updates until they are fully incorporated into the next major release of the genome.
- A fix patch represents corrections to an existing sequence.
- A novel patch represents novel sequence (perhaps filling some of the N's of runs).
As I understand it, these are not intended to replace the original sequences while the main release, GRCh37, is still in effect. They can be seen as a sneak preview of the next major release for those who are interested. This means that the GRCh37 primary assembly is the same between GRCh37, GRCh37.p1 and GRCh37.p2, and the patch sequences exist separately.
If you download the primary assembly sequence for Chromosome 5, say, from the GRC, you will get the original sequence, which doesn't include the updates. To get the updated sequences, you would need to download a separate patch file. Small annotation files are also provided that explain where the patch sequence "fits" in the original sequence.
Ensembl follows the pattern of keeping the original sequences separate from the patches - you would download the original sequences separately from the patch sequences.
I'm less sure of this, but I think NCBI also keeps the sequences separate. NCBI 37.2 contains a 'GRCh37.p2-Primary Assembly', and the patch sequences seem to be represented as a separate assembly - "GRCh37.p2-PATCHES".
Note also that GRCh37.p2 contains the mitochondrial sequence - previous versions of the assembly did not contain this.
modified 8.2 years ago
8.2 years ago by
Bio_X2Y • 3.6k