Question: Error in merging data in plink format (after cleaning data)
0
gravatar for fatima
2.6 years ago by
fatima10
fatima10 wrote:

Hi everyone,

I worked on case-only design. I tried to impute untyped genotypes come from immunochip. Before imputation, I tried to merge 1000G reference panel and cases in plink. I had the error:

Warning: Multiple positions seen for variant 'rs200357792'.
Warning: Multiple positions seen for variant 'rs201556956'.
Warning: Multiple positions seen for variant 'rs200991502'.
17838 markers loaded from CD_GermanyKielchr2_mod.bim.
7047141 markers to be merged from ref_b37_ph3.bim.
Of these, 7029359 are new, while 17782 are present in the base dataset.
Error: 7932 variants with 3+ alleles present.
* If you believe this is due to strand inconsistency, try --flip with
  merge-merge.missnp.
  (Warning: if this seems to work, strand errors involving SNPs with A/T or C/G
  alleles probably remain in your data.  If LD between nearby SNPs is high,
  --flip-scan should detect them.)
* If you are dealing with genuine multiallelic variants, we recommend exporting
  that subset of the data to VCF (via e.g. '--recode vcf'), merging with
  another tool/script, and then importing the result; PLINK is not yet suited
  to handling them.

I removed multiple position variants and duplicate variants. In addition I filtered triallelic SNPs in vcf file and did flip “prefix”.missnp. I tried again to merege cases and reference panel.

I had this error again:

Error: 7916 variants with 3+ alleles present.
* If you believe this is due to strand inconsistency, try --flip with
  mergefiles2-merge.missnp.
  (Warning: if this seems to work, strand errors involving SNPs with A/T or C/G
  alleles probably remain in your data.  If LD between nearby SNPs is high,
  --flip-scan should detect them.)
* If you are dealing with genuine multiallelic variants, we recommend exporting
  that subset of the data to VCF (via e.g. '--recode vcf'), merging with
  another tool/script, and then importing the result; PLINK is not yet suited
  to handling them.

I think plink is not suitable for my data. Am I true? What is the reason? Thank you in advance.

software error • 1.5k views
ADD COMMENTlink modified 14 months ago by Kevin Blighe41k • written 2.6 years ago by fatima10

I think that PLINK is suitable - what is the source of your non-1000G data, though?

Perhaps you could try to follow my tutorial, here: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format

You may have to remove SNPs from your non-1000G data that are called on the non-coding strand.

ADD REPLYlink written 14 months ago by Kevin Blighe41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 672 users visited in the last hour