Question: Accounting for problem SNPS when merging multiple plink files
3
gravatar for lhvkl
6.4 years ago by
lhvkl30
United Kingdom
lhvkl30 wrote:

I am after some advise as to what is the best method to correct for differences in allele codes at any given snp when merging across multiple files.  I have data in plink format (bed/bim/fam) for several populations.  When I attempt to merge the data using plink as follows:

plink --file fA --merge-list allfiles.txt --make-bed --out mynewdata

I get reports of +/- strand issues and a file is generated detailing the problem SNPs. 

On considering the .bim files at these problem snps for each population example allele codes are as follows:

Pop1: rs1000000  A  G

Pop2: rs1000000  T  C

Pop3: rs1000000  A  G

This indicates to me that Pop2 has undergone strand flip. 

Is there any software that can account for these differences when merging snp data - this must be a common problem? Or do each of these flips need to be identified computationally and corrected using plink to update the allele information as follows:

plink --bfile mydata --update=alleles mylist.txt --make-bed --out newfile

Thanks in advance.

 

ADD COMMENTlink modified 5.2 years ago by Biostar ♦♦ 20 • written 6.4 years ago by lhvkl30

I have followed the website and my "trial flip" results suggest that there are still strand issues. I don't want to remove the problem snps as this will reduce my snp count quite considerably.

The webpage you link to says: "PLINK cannot properly resolve genuine triallelic variants. We recommend exporting that subset of the data to VCF, using another tool/script to perform the merge in the way you want, and then importing the result."

Is there a way of merging VCF files that accounts for triallelic snps and strand flip? I've looked into vcftools but I'm not sure this is the right option.

ADD REPLYlink modified 8 months ago by RamRS30k • written 6.4 years ago by lhvkl30
7
gravatar for Maxime Lamontagne
6.4 years ago by
Québec
Maxime Lamontagne2.2k wrote:

PLINK gives you a list of SNPs who need to be flipped (???.missnp). You need to flip these SNPs.

Step 1 - First merge: plink --file fA --merge-list allfiles.txt --make-bed --out mynewdata

Step 2 - Flip SNPs: plink --file fA --flip mynewdata.missnp --make-bed --out mynewdata2

Step 3 - New merge: plink --bfile mynewdata2 --merge-list allfiles.txt --make-bed --out mynewdata3

After the second merge, if you still have a bug about the strand, those SNPs are probably triallelic.

ADD COMMENTlink modified 8 months ago by RamRS30k • written 6.4 years ago by Maxime Lamontagne2.2k

Thanks Maxime. Step 2 just flips those snps in file fA rather than across all files so the new merge in step 3 only corrects for strand flip in file fA when merging. How do you handle this problem across multiple files?

ADD REPLYlink modified 8 months ago by RamRS30k • written 6.4 years ago by lhvkl30

Across multiple files, you only add one file each time.

Merge File 1 + File 2 (Step 1-2-3) --> NewFile1

Merge NewFile1 + File 3 (Step 1-2-3) --> NewFile2

Merge NewFIle2 + File 4 (Step 1-2-3) --> NewFile3 ...

It will take some time, but it will work.

ADD REPLYlink modified 8 months ago by RamRS30k • written 6.4 years ago by Maxime Lamontagne2.2k

I was hoping there was a less cumbersome way around this problem but I'll try these repeated steps. Thank you.

ADD REPLYlink modified 8 months ago by RamRS30k • written 6.4 years ago by lhvkl30

--merge-list allows you to merge more than two files at a time. However, it does not really work for flips--you don't know which source file(s) need to flip which SNPs. So in your case (where you've verified that there probably are strand errors) the workflow described by Maxime is correct.

ADD REPLYlink modified 8 months ago by RamRS30k • written 6.4 years ago by chrchang5237.3k
0
gravatar for chrchang523
6.4 years ago by
chrchang5237.3k
United States
chrchang5237.3k wrote:

See the discussion at https://www.cog-genomics.org/plink2/data#merge3 .

ADD COMMENTlink written 6.4 years ago by chrchang5237.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1706 users visited in the last hour