Shorter Sequences In The Human Genomic Dna Download From Ucsc Genomic Browser Website
2
1
Entering edit mode
8.7 years ago

Hi,

I have downloaded the human genomic DNA from the UCSC genomic browser website from the link, hg38.chromFa.tar.gz . When I extracted, I got all the chromosomal sequences. While I got the chromosome 1 as chr1.fa, I also see several shorter chr1 sequences with names such as chr1_GL383518v1_alt.fa, chr1_KI270706v1_random.fa, chr1_KI270759v1_alt.fa to name a few. There are a total of twenty one such short sequences for chr1. I couldn't find any documentation in the website about these sequences. I would like to know what are these sequences.

Thanks

ucsc genomic • 4.2k views
ADD COMMENT
3
Entering edit mode
8.7 years ago

Alternate loci are a new feature in the latest releases (see http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/info/definitions.shtml ). Random chromosomes were also present in previous releases (What do chromosome codes such as 'chr_random' represent? )

ADD COMMENT
2
Entering edit mode
8.7 years ago
Neilfws 49k

The short answer is that these are sequences which, for various reasons, are not included in the current assembly of the chromosome. Some of them cannot be mapped or ordered reliably, some of them can be mapped but are unfinished. On other chromosomes, e.g. chr6, there are alternative versions arising from different haplotypes (large blocks of chromosome).

See the data/download FAQ, in particular the section chrN_random tables. You can also view assembly issues at the GRC website.

ADD COMMENT

Login before adding your answer.

Traffic: 1414 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6