Lifting over and harmonizing a plink file so major allele is forward on hg19?
0
0
Entering edit mode
24 months ago
curious ▴ 640

I got a PLINK file from a publication. I want to have the major allele correspond to the forward strand of hg19 because I want to analyze it with another file that is organized in this manner. I am curious if this is possible and if my thinking below is the right way.

Here is the head of the bim file, it supposed to be hg18. Using the UCSC browser suggests that these are not all reported on the same strand:

6   rs6938233   0   26022056    A   G
6   rs2071299   0   26022780    T   C
6   rs2071298   0   26022880    A   G
6   rs7450342   0   26027459    C   T


From looking up these few rsids manually on dbsnp, I think my anticipated result should be something like this when all is said and done: Major allele on forward strand of hg19:

6   25914077    rs6938233   0   T   C
6   25914801    rs2071299   0   A   G
6   25914901    rs2071298   0   A   G
6   25919480    rs7450342   0   C   T


I started by using https://github.com/sritchie73/liftOverPlink to lift over to change coordinates to hg19.

6   rs6938233   0   25914077    A   G
6   rs2071299   0   25914801    T   C
6   rs2071298   0   25914901    A   G
6   rs7450342   0   25919480    T   C


Ok, still need everything with the major allele on the forward strand, so I was hoping that I could use https://github.com/biocore-ntnu/snpflip, which gives me this:

chromosome  0_idx_position  snp_name    genetic_distance    allele_1    allele_2    reference   reference_rev   strand
6   25914076    rs6938233   0   A   G   C   G   reverse
6   25914800    rs2071299   0   T   C   G   C   reverse
6   25914900    rs2071298   0   A   G   G   C   forward
6   25919479    rs7450342   0   T   C   T   A   forward


This seems to detect the strandedness well, but not sure how to proceed other than updating the bim file by taking the compliment of everything that is detected as reverse here with plink --flip and then using plink --ref-from-fa, which would hopefully give me the desired output above. I am a bit worried about ambiguous SNPs though (G/C A/T). This is getting pretty messy, is my approach even close or is there a better way to achieve this. Any help is greatly appreciated.

plink strand liftover • 489 views