Question: Flip Gwas Data To Positive Strand (Hg19 Build 37)
5
gravatar for Kevin
6.3 years ago by
Kevin50
Provo, Utah
Kevin50 wrote:

I have GWAS data from Illumina HumanOmniExpress BeadChip in PLINK format. I am wondering the easiest way to find SNPs not mapped to the positive strand (using reference hg19/b37) and flip them. I know PLINK has the --flip command but it needs a list of SNPs to flip. How do I generate this list?

gwas snps plink • 8.1k views
ADD COMMENTlink modified 6.0 years ago by Endre Bakken Stovner60 • written 6.3 years ago by Kevin50
2

It can be a bit messy, get SNP names from plink MAP file, get strands and alleles from UCSC Tables, check if alleles match, then add strands, then flip. Or download SNP file from illumina to get strands?

ADD REPLYlink written 6.3 years ago by zx87549.3k

No, it is very easy. Please see: https://github.com/endrebak/snp-flip

ADD REPLYlink written 6.0 years ago by Endre Bakken Stovner60
1

REPO NOW AT https://github.com/endrebak/snpflip

ADD REPLYlink modified 6 months ago by RamRS27k • written 4.9 years ago by Endre Bakken Stovner890
6
gravatar for Endre Bakken Stovner
6.0 years ago by
Norway
Endre Bakken Stovner60 wrote:

I wrote a command line tool to do this very thing. Please see https://github.com/endrebak/snp-flip

The tool works right out of the box as long as you have biopython installed and a reference genome to do lookups in. See the github repo README.md for examples and documentation. It comes with example files to play around with.

Comments appreciated.

ADD COMMENTlink written 6.0 years ago by Endre Bakken Stovner60
1

REPO NOW AT https://github.com/endrebak/snpflip

ADD REPLYlink modified 6 months ago by RamRS27k • written 4.9 years ago by Endre Bakken Stovner890

How does your tool treat AT and CG? As far as I know the Plink format doesn't store which allele is the reference allele. Instead, Plink assigns the minor allele to allele1 and the major allele to allele2. The major allele is not always the reference allele.

ADD REPLYlink written 4.9 years ago by Matthias20

I use a reference genome to decide which is the reference allele; I do not consider any of the alleles in the plink file a reference allele (but perhaps that should be an option for old type plink files?). So if the plink file says A1 and A2 are A and T that SNP is considered ambiguous. I should add that as an example to the README.md and explain how the tool works a bit better.

ADD REPLYlink modified 4.9 years ago • written 4.9 years ago by Endre Bakken Stovner890

So your tool labels all SNPs with AT or GC as ambigous (meaning to delete them)? Deciding whether AT or GC has to be flipped or not is the major issue in the flipping process. Since you delete them all, you don't give a solution for that.

ADD REPLYlink modified 4.9 years ago • written 4.9 years ago by Matthias20
1

Whether you should delete them is up to you. If you find that no (or all) nonambiguous SNPs are on the reference strand, you should probably keep them, but flip all (or not).

Given only a plink file and a reference genome it is impossible to solve this problem - you need manifest files or more info (but be warned; down this road lies insanity - so many quirks, issues and bad data). Here are some files that might help: http://www.well.ox.ac.uk/~wrayner/strand/ though.

ADD REPLYlink modified 4.9 years ago • written 4.9 years ago by Endre Bakken Stovner890

I meet more terrible thing. Since the data was shared by others with plink format, ref/alt were changed, meanwhile, some allele is based on TOP, another dataset is based on BOTTOM. It's terrible to merge these two dataset.

ADD REPLYlink written 20 months ago by Shicheng Guo8.3k
2
gravatar for Maxime Lamontagne
6.2 years ago by
Québec
Maxime Lamontagne2.2k wrote:

You should look in the original output file (finalreport). This file sould have Top alleles (Illumina nomenclature) and the forward alleles. If you do the flip by comparing your minor allele with those from UCSC, all SNPs with MAF around 45% are problematic, especially AT and CG SNPs.

ADD COMMENTlink modified 6 months ago by RamRS27k • written 6.2 years ago by Maxime Lamontagne2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1550 users visited in the last hour