Human Genome : Hg18/Build36 Vs Hg19/Build37
4
5
Entering edit mode
11.4 years ago

It suspect that many web sites and people are still using the assembly hg18/b36 of the human genome and don't want to switch to hg19.

Do you know why ?

sequence human conversion • 11k views
6
Entering edit mode
11.4 years ago
Paulo Nuin ★ 3.7k

My only guess (and that's all I can do) is that most of the structures already established for hg18 would have to be updated to hg19, and sometimes that takes a while to do.

6
Entering edit mode

This is it exactly. Say I want to compare my data to study X, published last year. Oops, it was on hg18. I'll just lift over all that data, and ... @#$% - ten errors. Okay, so those regions don't exist anymore. I'll just edit the files manually... okay. Now I need to find out if it matches up with anything in the DGV. I've got that script all set up, and ... #$%^. Also on hg18, so I'll search the web, try to find the new version. Oh crap, they haven't created an hg19 version yet. So maybe I'll backport all of this data, or redo the mappings, or ...

0
Entering edit mode

Bottom line, not every web shop has a Pierre who could hack up the solution in an hour using a little Java code and an XSLT...

5
Entering edit mode
11.4 years ago
Neilfws 49k

One answer is that there is a lot more annotation for the HG18 build, because it has been around for 4 years versus just over 1 year. A lot of people use the UCSC genome website where, for example, the HG19 build has just one regulation track (CpG islands), whereas HG18 has ~ 34 tracks for regulation: histone modifications, transcriptions factors and so on.

I think HG18 is seen as the "stable reference", more by virtue of its age than anything else. I'm sure HG19 will gradually catch up as more annotations are added. We might ask why this process has not become quicker in the intervening 3-4 years.

0
Entering edit mode

There was a similar transition from hg17 to 18 - it took a year or two for everything to catch up. As for why the process isn't quicker, that has a lot to do with issues that we're familiar with, like lack of funding for database and annotation maintenance.

3
Entering edit mode
11.4 years ago

Here is my thoughts:

Always a reference dataset is required for the analysis. It is a general practice to freeze the dataset for analysis with stable version of available dataset at the time of first round of analysis. As suggested by Paulo & Chris changing the version requires considerable efforts. But the downside is we may end up doing analysis on a gene structure that may no longer exist or a transcript merged with another one. Even if it is takes considerable effort to switch, it is best to do the analysis on updated version of genome.

0
Entering edit mode
11.4 years ago
Jerry • 0

We provide alignment (mapping) software and services, and we just now are getting more requests for hg19 than hg18, so I can now say hg19 is finally "catching on" (after a year or more ?)

www.imagenix.com