Question

PLINK:Error: Variant '.' is not biallelic. To obtain a full list of merge failures

0

Entering edit mode

3.1 years ago

williamsbrian5064 ▴ 510

Hi,

I am trying to merge some WGS data with some SNP data. The WGS file contains about millions variants and the SNP data contains about 150k SNPS. The WGS data and the SNP data were both vcfs to start but I converted them to .ped and .map files using the following commands

plink --threads 4 --vcf start1.vcf --dog --out start1.output --maf 0.05 --mind 0.1 --geno 0.1 --recode --snps-only --biallelic-only strict

plink --threads 4 --vcf start2.vcf --dog --out start2.output --maf 0.05 --mind 0.1 --geno 0.1 --recode --snps-only --biallelic-only strict

I then tried to merge the files using the following command

plink --threads 4 --file start1.output --merge start2.output.ped start2.output.map --out merge.start1.start2 --maf 0.05 --mind 0.1 --geno 0.1 --recode --snps-only --dog --biallelic-only strict

When I do this I get the following error

Of these, 1 is new, while 8331241 are present in the base dataset.
405509 more multiple-position warnings: see log file.
Performing single-pass merge (14382 dogs, 124927 variants).
Pass 1: fileset #1 complete.
Error: Variant '.' is not biallelic. To obtain a full list of merge failures,
convert your data to binary format and retry the merge.

and before I get that error, I get tons of warnings that say

Warning: Multiple chromosomes seen for variant '.'.

I really don't know what I am doing wrong here. I have tried so many different ways to try and get this to work but end up getting the same error

Error: Variant '.' is not biallelic. To obtain a full list of merge failures,
convert your data to binary format and retry the merge.

Any help would be great

SNP Assembly genome next-gen • 2.8k views

ADD COMMENT • link 3.1 years ago by williamsbrian5064 ▴ 510

1

Entering edit mode

usually this is caused by the following: plink uses rsids to identify SNPs, so if they are missing, they will just be inferred as having the name '.'. So plink thinks all the snps with the name '.' are the same and gets confused when they appear on different chromosomes. you can use --set-all-var-ids from plink2 to assign IDs to all your snps, which should hopefully solve the problem.

ADD REPLY • link 3.1 years ago by 4galaxy77 2.8k

0

Entering edit mode

Thanks for responding! I actually just noticed that. The SNP data SNP ids in the map file are 'chr:pos' and the WGS map files just has the SNP ids like '.' like you said. In the WGS map file, I just combined the chromosome number and the position in the map file to match the format of the SNP map file. That seemed to solve the problem

ADD REPLY • link 3.1 years ago by williamsbrian5064 ▴ 510

1

Entering edit mode

As an additional note, the .ped format has been obsolete for close to a decade. plink 1.9’s native format is .bed (use —make-bed/—bfile instead of —recode/—file), which is far more efficient; in contrast, plink 1.9 has to inefficiently convert .ped to .bed before doing anything else every time you use —file (and this process is slower than with —vcf). No plink 2.0 build can even read or write .ped files right now.

ADD REPLY • link 3.1 years ago by chrchang523 10k

0

Entering edit mode

Thanks! I was trying to use bed files but I couldn't compare the bed files because they're in binary format. Is there a way to at least get the head of a bed file?

ADD REPLY • link 3.1 years ago by williamsbrian5064 ▴ 510

1

Entering edit mode

diff -q can be used to compare binary files for exact equality. If you want human-readable output every step of the way, --vcf/--recode vcf is not as bad as --file/--recode.

ADD REPLY • link 3.1 years ago by chrchang523 10k