Question: Accounting for problem SNPS when merging multiple plink files
gravatar for lhvkl
3.8 years ago by
United Kingdom
lhvkl20 wrote:

I am after some advise as to what is the best method to correct for differences in allele codes at any given snp when merging across multiple files.  I have data in plink format (bed/bim/fam) for several populations.  When I attempt to merge the data using plink as follows:

plink --file fA --merge-list allfiles.txt --make-bed --out mynewdata

I get reports of +/- strand issues and a file is generated detailing the problem SNPs. 

On considering the .bim files at these problem snps for each population example allele codes are as follows:

Pop1: rs1000000  A  G

Pop2: rs1000000  T  C

Pop3: rs1000000  A  G

This indicates to me that Pop2 has undergone strand flip. 

Is there any software that can account for these differences when merging snp data - this must be a common problem? Or do each of these flips need to be identified computationally and corrected using plink to update the allele information as follows:

plink --bfile mydata --update=alleles mylist.txt --make-bed --out newfile

Thanks in advance.


ADD COMMENTlink modified 2.7 years ago by Biostar ♦♦ 20 • written 3.8 years ago by lhvkl20

I have followed the website and my "trial flip" results suggest that there are still strand issues.  I don't want to remove the problem snps as this will reduce my snp count quite considerably.  

The webpage you link to says: "PLINK cannot properly resolve genuine triallelic variants. We recommend exporting that subset of the data to VCF, using another tool/script to perform the merge in the way you want, and then importing the result. "

Is there a way of merging VCF files that accounts for triallelic snps and strand flip?  I've looked into vcftools but I'm not sure this is the right option.

ADD REPLYlink written 3.8 years ago by lhvkl20
gravatar for Maxime Lamontagne
3.8 years ago by
Maxime Lamontagne2.0k wrote:

PLINK gives you a list of SNPs who need to be flipped (???.missnp). You need to flip these SNPs.

Step 1 - First merge:  plink --file fA --merge-list allfiles.txt --make-bed --out mynewdata

Step 2 - Flip SNPs: plink --file fA --flip mynewdata.missnp --make-bed --out mynewdata2

Step 3 - New merge: plink --bfile mynewdata2 --merge-list allfiles.txt --make-bed --out mynewdata3

After the second merge, if you still have a bug about the strand, those SNPs are probably triallelic.

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Maxime Lamontagne2.0k

Thanks Maxime.  Step 2 just flips those snps in file fA rather than across all files so the new merge in step 3 only corrects for strand flip in file fA when merging.  How do you handle this problem across multiple files?

ADD REPLYlink written 3.8 years ago by lhvkl20

Across multiple files, you only add one file each time.

Merge File 1 + File 2 (Step 1-2-3) --> NewFile1

Merge NewFile1 + File 3 (Step 1-2-3) --> NewFile2

Merge NewFIle2 + File 4 (Step 1-2-3) --> NewFile3 ...

It will take some time, but it will work.

ADD REPLYlink written 3.8 years ago by Maxime Lamontagne2.0k

I was hoping there was a less cumbersome way around this problem but I'll try these repeated steps.  Thank you.

ADD REPLYlink written 3.8 years ago by lhvkl20

--merge-list allows you to merge more than two files at a time.  However, it does not really work for flips--you don't know which source file(s) need to flip which SNPs.  So in your case (where you've verified that there probably are strand errors) the workflow described by Maxime is correct.

ADD REPLYlink written 3.8 years ago by chrchang5232.9k
gravatar for chrchang523
3.8 years ago by
United States
chrchang5232.9k wrote:

See the discussion at .

ADD COMMENTlink written 3.8 years ago by chrchang5232.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 868 users visited in the last hour