Question: How Many Human Genome Assemblies Are Avaliable?
Alex

How many human genomes assemblies are avaliable for analysis? On NCBI website I found three avaliable genomes assembled in chromosomes:

  1. the reference assembly
  2. the Celera assembly
  3. and diploid Venters's genome.

Additionaly there are three WGS assembly that are not assembled in chromosomes:

  1. Watson's genome
  2. African genome
  3. Asian genome

Are there any other avaliable assemblies that are not listed by NCBI?

Jorge Amigo
Jorge Amigo

in case you mean "browseable" assemblies yes, as far as I am concerned these are all the publicly available ones to date.

but if you want human genome assemblies for deeper analysis, doesn't the 1000 Genomes data suit your needs? you can even consider digging into the major NGS repositories such as the american SRA or the european ENA.

Pierre Lindenbaum

See the description of the track "Genome Variants" in the UCSC genome Browser:

This track displays variant base calls from the publicly released genome sequences of several individuals:

* 5 Sub-Saharan African genomes sequenced by Penn State University:
      o !Gubi (KB1),
      o G/aq'o (NB1),
      o !Ai (MD8),
      o D#kgao (TK1),
      o Archbishop Desmond Tutu (ABT), 
* 6 individuals from the 1000 Genome Project high-coverage pilot:
      o a CEU daughter and parents (NA12878, NA12891, NA12892)
      o a YRI daughter and parents (NA19240, NA19238, NA19239) 
* and independently published genomes:
      o Craig Venter,
      o James Watson,
      o Anonymous Yoruba individual NA18507,
      o Anonymous Han Chinese individual (YH, YanHuang Project),
      o Seong-Jim Kim (SJK),
      o Anonymous Korean individual (AK1),
      o Stephen Quake,
      o Anonymous Irish male,
      o Extinct Palaeo-Eskimo Saqqaq individual
lh3

I do not know how one would define "assembly". But in the sense of de novo assembly, 5 are publicly available:

  • The official human reference genome
  • Celera assembly
  • Venter
  • YanHuang
  • NA18507

In the sense of mapping assembly, there are very few. For all the sequencing projects in the public domain, you can always get the raw reads, sometimes the list of SNPs and occasionally the alignment, but these are not really mapping assembly. In my definition of mapping assembly, you have to know which regions are accessible and which are not, but this is rarely available.

I have processed some of the published data sets in a uniform way. For people who are interested, they are here.

