Overview Of Human Genome 19 References
2
1
Entering edit mode
11.7 years ago
William ★ 5.3k

Is there an overview of all the different human genome 19 versions, with chromosome names and sizes ?

I have a bam file that has the following reference contigs (not all displayed):

@SQ SN:chr1 LN:249250621 UR:file:/share/reference/genomes/human_hg19/human_hg19.fasta

@SQ SN:chr2 LN:243199373 UR:file:/share/reference/genomes/human_hg19/human_hg19.fasta

@SQ SN:chr3 LN:198022430 UR:file:/share/reference/genomes/human_hg19/human_hg19.fasta

......

@SQ SN:chrM LN:16571 UR:file:/share/reference/genomes/human_hg19/human_hg19.fasta

The bam is not produced in my organization so the url doesn't point anywhere.

Which version of the human genome 19 do I need based on the chromosome names (chr*) and the chromosome lengths?

bam hg19 • 4.4k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
6
Entering edit mode
11.7 years ago

UCSC and GRC/NCBI/1kg/Ens package different mitochondrial sequences with their human autosomal chromosomes - the one you have there is NC_001807 16571bp used by UCSC (but not for long). Those names are also UCSC hg19 deflines.

There really are no "versions" of GRCH37/hg19 in the sense that chromosomes change lengths or content within a freeze. This seems to be a common misunderstanding and I am considering making a poster or something to clear this up. Of course, subsets of unscaffolded contigs and the extra haplotype sequences might differ depending on who has constructed the index.

The pseudoautosomal regions are masked on chrY of Ensembl and G1k but not in UCSC hg19 or GRC

ADD COMMENT
1
Entering edit mode

Hi Jeremy, are you sure that UCSC masked the pseudoautosomal regions on chrY? Have a look, PAR#1 defined here ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37/par.txt is present unmasked both in the Genome Browser http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&position=chrY:9951-10050 and in the reference for download http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/chrY.fa.gz

Actually, shouldn't the fact that the pseudoautosomal regions are present in both X and Y cause alignment problems for all people using hg19 for short read alignments? Basically that would be the same as having a part of an autosomal chromosome represented twice in the reference (which is in fact the case for the haplotype sequences included in hg19, isn't it?)

ADD REPLY
0
Entering edit mode

Good find. I have corrected my answer.

ADD REPLY
1
Entering edit mode
11.7 years ago
deanna.church ★ 1.1k

There are official and 'versioned' human reference assemblies. When the GRC releases an assembly, it is submitted to an INSDC database (GenBank/EMBL/DDBJ). Every sequence in the assembly is given an accession.version which allows for robust tracking of the assembly. http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/data/index.shtml and http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/index.shtml

ADD COMMENT

Login before adding your answer.

Traffic: 2409 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6