Question

Liftover With Chr_Random Positions

0

Entering edit mode

12.8 years ago

Biomed 5.0k

How can I best map hg18 positions from chrrandoms to hg19 based positions? I use liftOver for regular chromosomes but it doesn't work with chrrandom positions.

Also in the UCSC chrominfo.txt file there are multiple chrrandom entries for a chromosome( e.g ['chr8gl000196random', '38914'] and ['chr8gl000197_random', '37175']) Can you help me understand what these mean?

Thanks

liftover chromosome random • 4.7k views

ADD COMMENT • link updated 12.7 years ago by Jorge Amigo 14k • written 12.8 years ago by Biomed 5.0k

Ram · Answer 1 · 2011-08-02

although UCSC's hg18 chromome download site does slightly mention these random sites, the UCSC's hg19 chromosome download site gives a little bit more information on them:

The chr*_random sequences are unplaced sequence on those reference chromosomes.

The chrUn_* sequences are unlocalized sequences where the corresponding reference chromosome has not been determined.

See also: NCBI discussion of assembly procedures: http://www.ncbi.nlm.nih.gov/genome/assembly/assembly.shtml

this must be the reason why liftOver doesn't deal with these positions, since as they weren't able to be placed plus they contain highly repetitive and complex regions it wasn't possible to locate them not only on the proper version of the genome, but also in forthcoming ones.

there's a very interesting discussion arisen here on BioStar regarding chr_random positions, where lh3 pointed out what I believe should be what we all should do with such positions (he was talking about using these contigs or not in the mapping process):

Seeing the difference is exactly the reason why chr_random should be included. Most people do not care about the SNPs/signals in unlocalized/unplaced contigs, but we do care false SNPs/signals caused by reads coming from these contigs but wrongly mapped to chromosomal regions.

so, in summary, I would say that chr_random results should not be taken into account for final results, but as a tool to remove false positives. the best way to deal with such regions would be to wait for a newer and better described genome (hg19 in your case) and try to reanalyze your data considering the entire context, either by blasting your flanking sequences as Istvan suggests or even by remapping your initial data. of course, if you're working only with positions alone then I guess blasting would be the most appropriate thing to do, as you could retrieve flanking sequences programmaticly from your chr_random positions.

score 0 · Answer 2 · 2011-06-25

0

Entering edit mode

12.8 years ago

Istvan Albert 100k

You could select the sequences that correspond to the intervals from the old build and blast them against the new build.

ADD COMMENT • link 12.8 years ago by Istvan Albert 100k