Entering edit mode
3 months ago
Nejla
•
0
Hello, I've been trying to merge multiple PLINK files (bed bim fam). I have 6 datasets that are genotyped on different genotyping arrays. So technically when I try to merge them using the command
plink --bfile dataset1 --merge-list all_lists.txt --make-bed --out merged_data
I get multiple warnings and error
`Warning: Multiple positions seen for variant 'rs6687776'.
Warning: Multiple positions seen for variant 'rs2887286'.
Warning: Multiple positions seen for variant 'rs3813199'.
Warning: Multiple chromosomes seen for variant 'rs10128688'.
Warning: Multiple chromosomes seen for variant 'rs10106770'.
Warning: Multiple chromosomes seen for variant 'rs2097173'.
Warning: Multiple chromosomes seen for variant 'rs10064939'.
Warning: Multiple chromosomes seen for variant 'rs10059910'.
Warning: Multiple chromosomes seen for variant 'rs11857958'.
Warning: Multiple chromosomes seen for variant 'rs11757628'.
Warning: Multiple chromosomes seen for variant 'rs11162247'.
Warning: Multiple chromosomes seen for variant 'rs13074336'.
Warning: Multiple chromosomes seen for variant 'rs2371122'.
Warning: Multiple chromosomes seen for variant 'rs41431048'.
Warning: Multiple chromosomes seen for variant 'rs13092372'.
Warning: Multiple chromosomes seen for variant 'rs13151824'.
Warning: Multiple chromosomes seen for variant 'rs2187291'.
Warning: Multiple chromosomes seen for variant 'rs13413435'.
Warning: Multiple chromosomes seen for variant 'rs11025370'.
Warning: Multiple chromosomes seen for variant 'rs9798668'.
Warning: Multiple chromosomes seen for variant 'rs2569201'.
Warning: Multiple chromosomes seen for variant 'rs12043679'.
937794 more multiple-position warnings: see log file.
Error: 126705 variants with 3+ alleles present.
* If you believe this is due to strand inconsistency, try --flip with
test_merge-merge.missnp.
(Warning: if this seems to work, strand errors involving SNPs with A/T or C/G
alleles probably remain in your data. If LD between nearby SNPs is high,
--flip-scan should detect them.)
* If you are dealing with genuine multiallelic variants, we recommend exporting
that subset of the data to VCF (via e.g. '--recode vcf'), merging with
another tool/script, and then importing the result; PLINK is not yet suited
to handling them.
See https://www.cog-genomics.org/plink/1.9/data#merge3 for more discussion.`
So I flipped all the datasets, then excluded all the snps causing the errors and tried the merge again. It worked, however I ended up having a genotyping rate of 0.2 which is very low. Does anyone know how I can merge all the datasets and keep the maximum number of snp and a high genotyping rate? Thank you!