Question: Liftover With Chr_Random Positions
0
gravatar for Biomed
8.5 years ago by
Biomed4.6k
Bethesda, MD, USA
Biomed4.6k wrote:

How can I best map hg18 positions from chrrandoms to hg19 based positions? I use liftOver for regular chromosomes but it doesn't work with chrrandom positions.

Also in the UCSC chrominfo.txt file there are multiple chrrandom entries for a chromosome( e.g ['chr8gl000196random', '38914'] and ['chr8gl000197_random', '37175']) Can you help me understand what these mean?

Thanks

liftover random chromosome • 2.7k views
ADD COMMENTlink modified 8.4 years ago by Jorge Amigo11k • written 8.5 years ago by Biomed4.6k
1
gravatar for Jorge Amigo
8.4 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

although UCSC's hg18 chromome download site does slightly mention these random sites, the UCSC's hg19 chromosome download site gives a little bit more information on them:

  • The chr*_random sequences are unplaced sequence on those reference chromosomes.
  • The chrUn_* sequences are unlocalized sequences where the corresponding reference chromosome has not been determined.

See also: NCBI discussion of assembly procedures: http://www.ncbi.nlm.nih.gov/genome/assembly/assembly.shtml

this must be the reason why liftOver doesn't deal with these positions, since as they weren't able to be placed plus they contain highly repetitive and complex regions it wasn't possible to locate them not only on the proper version of the genome, but also in forthcoming ones.

there's a very interesting discussion arisen here on BioStar regarding chr_random positions, where lh3 pointed out what I believe should be what we all should do with such positions (he was talking about using these contigs or not in the mapping process):

Seeing the difference is exactly the reason why chr_random should be included. Most people do not care about the SNPs/signals in unlocalized/unplaced contigs, but we do care false SNPs/signals caused by reads coming from these contigs but wrongly mapped to chromosomal regions.

so, in summary, I would say that chr_random results should not be taken into account for final results, but as a tool to remove false positives. the best way to deal with such regions would be to wait for a newer and better described genome (hg19 in your case) and try to reanalyze your data considering the entire context, either by blasting your flanking sequences as Istvan suggests or even by remapping your initial data. of course, if you're working only with positions alone then I guess blasting would be the most appropriate thing to do, as you could retrieve flanking sequences programmaticly from your chr_random positions.

ADD COMMENTlink modified 12 weeks ago by RamRS25k • written 8.4 years ago by Jorge Amigo11k
0
gravatar for Istvan Albert
8.5 years ago by
Istvan Albert ♦♦ 82k
University Park, USA
Istvan Albert ♦♦ 82k wrote:

You could select the sequences that correspond to the intervals from the old build and blast them against the new build.

ADD COMMENTlink written 8.5 years ago by Istvan Albert ♦♦ 82k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 786 users visited in the last hour