Is this an OK approach for lifting over a mixture of b36 & b37 to b38 variants using rsids only?
Entering edit mode
8 months ago
curious ▴ 600

I am working with an old, but widely used "mixed" dataset that contains SNPS mapped to a mixture of b36 + 37 coordinates

I don't know which build each SNP refers to, but each is labeled with an rsid. So I essentially tried to lift to b38 by rsid only like this:

  1. I updated the positions of the "mixed" dataset to b38 positions by merging with dbSNP141 on rsid ot create a "lifted" set.

  2. I downloaded 30x 1000 genomes data, which is called de novo on b38 and updated ID to include rsid

  3. II used beagle conform gt to make a "harmonized lifted" set by comparing to 1000 genomes as reference. This should make sure alleles/strand are harmonized between the datasets using freq and LD to correct ambiguous sites.

I realize this isn't ideal, but does this approach seem OK or are there better alternatives? I was able to "lift" 7022 of my original 7281 mixed build sites like this. plotting allele freq against a b38 references like topmed looks really clean too, so I think it worked

liftover • 641 views

Login before adding your answer.

Traffic: 1980 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6