Plink-GWAS- how to solve heterozygous haploid warning in data cleaning
1
1
Entering edit mode
10.0 years ago

Hi all,

I am using Plink for GWAS studies. For the data cleaning step, when I am filtering for MAF (minor allele frequency), it is giving me some warnings such as -

Plink is setting 111834 heterozygous haploid as missing

From this link, I found that it could be solved by --split-x command but I am finding difficulty to use it.

It would be great if anyone can give me an example command so, I could get an exact idea of its usage?

Thanks a lot.

PC

Plink genotype SNP • 15k views
ADD COMMENT
1
Entering edit mode

Hi

This post was quite useful, thank you. What happens if you have done all that, but still there remains a large number of het haploid genotypes?

I have quite a few heterozygote haploid genotypes. I have removed individuals who failed sex checks, made a new chromosome code for the SNPs in the pseudoautosomal region and addressed the nonmissing nonmale Y genotypes (it actually still gives me the Warning regarding the nonmissing female Y genotypes even though I tried to erase them). I still have over 14000 heterozygote haploid genotypes present. What can I do? Should I just remove these SNPs? Is there anything else that I can do?

p.s. I am trying to address this, as my dataset seems to have an unusually large missingness rate. Setting the --mind to 0.1 which is standard allows only 2% of the samples to pass the QC!!!!!

ADD REPLY
0
Entering edit mode

Thanks a lot for your suggestion.

I have few questions -

  1. How would I know that heterozygous haploid warnings are involving X-chromosome and what if I don't know the build of my data?
  2. You are using "bfile" in the command, but I haven't converted ped and map file into binary filesets before data cleaning. Should I do that first?

Thanks once again.

PC

ADD REPLY
1
Entering edit mode
  1. The .hh file has details on the heterozygous haploid warnings. Check if the SNP IDs are on the X chromosome, the Y chromosome, or both.
  2. Replace --bfile with --file in the first command; nothing else needs to change.
ADD REPLY
0
Entering edit mode

Thanks a lot for your reply.

When I am using this command -

plink --file unclean_fileset --split-x b37 --make-bed --out clean_fileset

Plink stops with the following error-

**Unused command line option: --split-x
**Unsed command line option: b37

Do you have any idea about this error?

Thanks

ADD REPLY
1
Entering edit mode

--split-x is a new PLINK 1.9 flag; it will not work in 1.07.

ADD REPLY
0
Entering edit mode

Hi chrchang523,

I tried plink 1.9 for solving heterozygous haploid warning using the following command-

plink --file unclean_fileset --split-x b37 --make-bed --out clean_fileset

Earlier I was getting 103317 heterozygous haploids but now it is decreased to 103205.

But still getting some errors which I have pasted below-

Warning: 103205 het. haploid genotypes present (see HbF_hh_clean.hh).
Warning: Nonmissing nonmale Y chromosome genotype(s) present.
Total genotyping rate is 0.996871.
894327 variants and 254 people pass filters and QC.
Error: --split-x cannot be used when the dataset already contains an XY region.
(Did you mean --merge-x instead?)

Please suggest me to solve it.

Thanks a lot

ADD REPLY
0
Entering edit mode

Could you delete this answer make it a comment on chrchang523 's answer rather than an answer itself (I can move it to a comment, but only on your original post).

ADD REPLY
0
Entering edit mode

Hi, I was able to add it in comment but not able to delete my answer.

I don't see any delete button here.

Thanks.

ADD REPLY
0
Entering edit mode

Thanks a lot for your suggestion.

I have few questions-

  1. How would I know that heterozygous haploid warnings are involving X-chromosome and what if I don't know the build of my data?
  2. You are using "bfile" in the command, but I haven't converted ped and map file into binary filesets before data cleaning. Should I do that first?

Thanks once again.

PC

ADD REPLY
5
Entering edit mode
10.0 years ago

If all the heterozygous haploid warnings involve the X chromosome, and your data uses build 37 coordinates,

plink --bfile unclean_fileset --split-x b37 --make-bed --out clean_fileset

should work.

If there also are nonmissing female genotype calls on the Y chromosome, and you're sure there are no gender errors in your .fam file, you can then use

plink --bfile clean_fileset --make-bed --set-hh-missing --out cleaner_fileset

to erase those too.

ADD COMMENT
0
Entering edit mode

Thanks a lot for your suggestion.

I have few questions-

1) How would I know that heterozygous haploid warnings are involving X-chromosome and what if I don't know the build of my data?

2) You are using "bfile" in the command, but I haven't converted ped and map file into binary filesets before data cleaning. Should I do that first?

Thanks once again.

PC

ADD REPLY

Login before adding your answer.

Traffic: 2629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6