PLINK:Error: Variant '.' is not biallelic. To obtain a full list of merge failures
0
0
Entering edit mode
3.3 years ago

Hi,

I am trying to merge some WGS data with some SNP data. The WGS file contains about millions variants and the SNP data contains about 150k SNPS. The WGS data and the SNP data were both vcfs to start but I converted them to .ped and .map files using the following commands

plink --threads 4 --vcf start1.vcf --dog --out start1.output --maf 0.05 --mind 0.1 --geno 0.1 --recode --snps-only --biallelic-only strict

plink --threads 4 --vcf start2.vcf --dog --out start2.output --maf 0.05 --mind 0.1 --geno 0.1 --recode --snps-only --biallelic-only strict

I then tried to merge the files using the following command

plink --threads 4 --file start1.output --merge start2.output.ped start2.output.map --out merge.start1.start2 --maf 0.05 --mind 0.1 --geno 0.1 --recode --snps-only --dog --biallelic-only strict

When I do this I get the following error

Of these, 1 is new, while 8331241 are present in the base dataset.
405509 more multiple-position warnings: see log file.
Performing single-pass merge (14382 dogs, 124927 variants).
Pass 1: fileset #1 complete.
Error: Variant '.' is not biallelic. To obtain a full list of merge failures,
convert your data to binary format and retry the merge.

and before I get that error, I get tons of warnings that say

Warning: Multiple chromosomes seen for variant '.'.

I really don't know what I am doing wrong here. I have tried so many different ways to try and get this to work but end up getting the same error

Error: Variant '.' is not biallelic. To obtain a full list of merge failures,
convert your data to binary format and retry the merge.

Any help would be great

SNP Assembly genome next-gen • 3.0k views
ADD COMMENT
1
Entering edit mode

usually this is caused by the following: plink uses rsids to identify SNPs, so if they are missing, they will just be inferred as having the name '.'. So plink thinks all the snps with the name '.' are the same and gets confused when they appear on different chromosomes. you can use --set-all-var-ids from plink2 to assign IDs to all your snps, which should hopefully solve the problem.

ADD REPLY
0
Entering edit mode

Thanks for responding! I actually just noticed that. The SNP data SNP ids in the map file are 'chr:pos' and the WGS map files just has the SNP ids like '.' like you said. In the WGS map file, I just combined the chromosome number and the position in the map file to match the format of the SNP map file. That seemed to solve the problem

ADD REPLY
1
Entering edit mode

As an additional note, the .ped format has been obsolete for close to a decade. plink 1.9’s native format is .bed (use —make-bed/—bfile instead of —recode/—file), which is far more efficient; in contrast, plink 1.9 has to inefficiently convert .ped to .bed before doing anything else every time you use —file (and this process is slower than with —vcf). No plink 2.0 build can even read or write .ped files right now.

ADD REPLY
0
Entering edit mode

Thanks! I was trying to use bed files but I couldn't compare the bed files because they're in binary format. Is there a way to at least get the head of a bed file?

ADD REPLY
1
Entering edit mode

diff -q can be used to compare binary files for exact equality. If you want human-readable output every step of the way, --vcf/--recode vcf is not as bad as --file/--recode.

ADD REPLY

Login before adding your answer.

Traffic: 1885 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6