8.4 years ago by
Santiago de Compostela, Spain
although UCSC's hg18 chromome download site does slightly mention these random sites, the UCSC's hg19 chromosome download site gives a little bit more information on them:
- The chr*_random sequences are unplaced sequence on those reference
- The chrUn_* sequences are unlocalized sequences where the
corresponding reference chromosome has
not been determined.
See also: NCBI
discussion of assembly procedures: http://www.ncbi.nlm.nih.gov/genome/assembly/assembly.shtml
this must be the reason why liftOver doesn't deal with these positions, since as they weren't able to be placed plus they contain highly repetitive and complex regions it wasn't possible to locate them not only on the proper version of the genome, but also in forthcoming ones.
there's a very interesting discussion arisen here on BioStar regarding chr_random positions, where lh3 pointed out what I believe should be what we all should do with such positions (he was talking about using these contigs or not in the mapping process):
Seeing the difference is exactly the reason why chr_random should be included. Most people do not care about the SNPs/signals in unlocalized/unplaced contigs, but we do care false SNPs/signals caused by reads coming from these contigs but wrongly mapped to chromosomal regions.
so, in summary, I would say that chr_random results should not be taken into account for final results, but as a tool to remove false positives. the best way to deal with such regions would be to wait for a newer and better described genome (hg19 in your case) and try to reanalyze your data considering the entire context, either by blasting your flanking sequences as Istvan suggests or even by remapping your initial data. of course, if you're working only with positions alone then I guess blasting would be the most appropriate thing to do, as you could retrieve flanking sequences programmaticly from your chr_random positions.