Question: Plink-GWAS- how to solve heterozygous haploid warning in data cleaning
1
gravatar for prachimunjal14
5.9 years ago by
United States
prachimunjal1410 wrote:

Hi all,

I am using Plink for GWAS studies. For the data cleaning step, when I am filtering for MAF (minor allele frequency), it is giving me some warnings such as -

"Plink is setting 111834 heterozygous haploid as missing"

From the following link, I found that it could be solved by "--split-x” command but I am finding difficulty to use it.   https://www.cog-genomics.org/plink2/data#split_x

It would be great if anyone can give me an example command so, I could get an exact idea of its usage?

Thanks a lot.

PC

 

genotype snp plink • 9.4k views
ADD COMMENTlink modified 4.2 years ago by HumeMarx20 • written 5.9 years ago by prachimunjal1410
1

Hi

This post was quite useful, thank you. What happens if you have done all that, but still there remains a large number of het haploid genotypes?

I have quite a few heterozygote haploid genotypes. I have removed individuals who failed sex checks, made a new chromosome code for the SNPs in the pseudoautosomal region and addressed the nonmissing nonmale Y genotypes (it actually still gives me the Warning regarding the nonmissing female Y genotypes even though I tried to erase them). I still have over 14000 heterozygote haploid genotypes present. What can I do? Should I just remove these SNPs? Is there anything else that I can do?

p.s. I am trying to address this, as my dataset seems to have an unusually large missingness rate. Setting the --mind to 0.1 which is standard allows only 2% of the samples to pass the QC!!!!!

ADD REPLYlink modified 11 weeks ago by RamRS25k • written 4.2 years ago by HumeMarx20

Thanks a lot for your suggestion.

I have few questions -

  1. How would I know that heterozygous haploid warnings are involving X-chromosome and what if I don't know the build of my data?
  2. You are using "bfile" in the command, but I haven't converted ped and map file into binary filesets before data cleaning. Should I do that first?

Thanks once again.

PC

ADD REPLYlink modified 11 weeks ago by RamRS25k • written 5.9 years ago by prachimunjal1410
1
  1. The .hh file has details on the heterozygous haploid warnings. Check if the SNP IDs are on the X chromosome, the Y chromosome, or both.
  2. Replace --bfile with --file in the first command; nothing else needs to change.
ADD REPLYlink modified 11 weeks ago by RamRS25k • written 5.9 years ago by chrchang5236.5k

Thanks a lot for your reply.

When I am using this command -

plink --file unclean_fileset --split-x b37 --make-bed --out clean_fileset

Plink stops with the following error-

**Unused command line option: --split-x
**Unsed command line option: b37

Do you have any idea about this error?

Thanks

ADD REPLYlink modified 11 weeks ago by RamRS25k • written 5.9 years ago by prachimunjal1410
1

--split-x is a new PLINK 1.9 flag; it will not work in 1.07.

ADD REPLYlink modified 11 weeks ago by RamRS25k • written 5.9 years ago by chrchang5236.5k

Hi chrchang523,

I tried plink 1.9 for solving heterozygous haploid warning using the following command-

plink --file unclean_fileset --split-x b37 --make-bed --out clean_fileset

Earlier I was getting 103317 heterozygous haploids but now it is decreased to 103205.

But still getting some errors which I have pasted below-

Warning: 103205 het. haploid genotypes present (see HbF_hh_clean.hh).
Warning: Nonmissing nonmale Y chromosome genotype(s) present.
Total genotyping rate is 0.996871.
894327 variants and 254 people pass filters and QC.
Error: --split-x cannot be used when the dataset already contains an XY region.
(Did you mean --merge-x instead?)

Please suggest me to solve it.

Thanks a lot

ADD REPLYlink modified 11 weeks ago by RamRS25k • written 5.8 years ago by prachimunjal1410

Could you delete this answer make it a comment on chrchang523 's answer rather than an answer itself (I can move it to a comment, but only on your original post).

ADD REPLYlink modified 11 weeks ago by RamRS25k • written 5.9 years ago by Devon Ryan94k

Hi, I was able to add it in comment but not able to delete my answer.

I don't see any delete button here.

Thanks.

ADD REPLYlink modified 11 weeks ago by RamRS25k • written 5.9 years ago by prachimunjal1410

Thanks a lot for your suggestion.

I have few questions-

  1. How would I know that heterozygous haploid warnings are involving X-chromosome and what if I don't know the build of my data?
  2. You are using "bfile" in the command, but I haven't converted ped and map file into binary filesets before data cleaning. Should I do that first?

Thanks once again.

PC

ADD REPLYlink modified 11 weeks ago by RamRS25k • written 5.9 years ago by prachimunjal1410
3
gravatar for chrchang523
5.9 years ago by
chrchang5236.5k
United States
chrchang5236.5k wrote:

If all the heterozygous haploid warnings involve the X chromosome, and your data uses build 37 coordinates,

plink --bfile unclean_fileset --split-x b37 --make-bed --out clean_fileset

should work.

If there also are nonmissing female genotype calls on the Y chromosome, and you're sure there are no gender errors in your .fam file, you can then use

plink --bfile clean_fileset --make-bed --set-hh-missing --out cleaner_fileset

to erase those too.

ADD COMMENTlink modified 11 weeks ago by RamRS25k • written 5.9 years ago by chrchang5236.5k

Thanks a lot for your suggestion.

I have few questions-

1) How would I know that heterozygous haploid warnings are involving X-chromosome and what if I don't know the build of my data?

2) You are using "bfile" in the command, but I haven't converted ped and map file into binary filesets before data cleaning. Should I do that first?

Thanks once again.

PC

ADD REPLYlink written 5.9 years ago by prachimunjal1410
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1289 users visited in the last hour