Im trying to impute some genotype data on the Michigan imputation server (https://imputationserver.sph.umich.edu/ ). For autosomes this works perfect and straightforward. However for chromosome X I ran into some trouble. Initially that Eagle v2.3 does not work on chromosome X and so you need to use shapeit. Then we I tried to impute chromosome X I get the error that there are heterozygous variants in my males...
Chromosome X check failed! java.io.IOException:
Found haplotype 0/1 at pos 2703633 for male proband 1038_1
Found haplotype 0/1 at pos 2703633 for male proband 1700_1
Found haplotype 0/1 at pos 2703633 for male proband 2147_2
Found haplotype 0/1 at pos 2703633 for male proband 2296_3
Error during manifest file creation.
Then when I split males and females, I get the same error for males again and another error for females:
Chromosome X check failed!
java.io.IOException: Something went wrong with the keepSamples male command
Error during manifest file creation.
Then I started reading a bit about it and found that the PAR regions are there -> The non-PAR is located on chrX:2699520-154931043 on build hg19 (http://genome.sph.umich.edu/wiki/Minimac3_Cookbook_:_Chromosome_X_Imputation ), but all my 7495 chromosome X variants are in this area, so this does not help me much. Because I already tried splitting males and females did not work. Also because the variants here are in the PAR region, males can be heterozygous so just changing all the heterozygous variants of males to missing is not appropriate I think.
Does anyone has any recommendations for this? Or am I doing something completely wrong?
I finally was able to impute also my X chromosome by setting as missing values those heterozygous SNPs from male samples. This heterozigous genotypes are probably due to a bad classification by the genotyping software.
In any case, I was able to remove them with plink when creating the vcf files:
I have just successfully imputed chrX on Michigan Imputation Server.
You met two separate problems. The first was that your males had heterozygous SNPs. This complaint was from Shapeit. Yes, you have to remove all heterozygous SNPs of males in your data set. I did that by setting those to missing. I used python to process my data set in plain text format for that purpose.
The second problem you met was that your new data set had only males. There is a bug in the Michigan Imputation Server source code. I have opened an issue on their github page,
You can find the bug yourself if you take a look of their source code. Before they fix the bug, the workaround is to submit data sets with both males and females. Do not split males and females. The Michigan Imputation Server will split for you. And if you split them, Michigan Imputation Server does not like that and gives you the error message you presented. If you have a data set with only males, as I did, add in a few females, as I did.
Talking with a friend which had the same problem as we, we checked Michigan´s code and the problem was that one of the lines of the code calls plink to check for sex in order to create the 3 files (one for males, another for females and the other for the pseudoautosomal (PAR) zone). Is in this part of the code where we get the error.
After various test we realized that if your initial array has a low number of variants in the pseudoautosomal region, the software is not able to classify it and gives the error. However, if the pseudoautosomal is well characterized is able to give you the results for the pseudoautosomal region. For example, using the array OMni Quad from Illumina which has 10exp6 variants it was impossible to get imputated results for the pseudoautosomal region but when using the OMNI 2.5 with 2.5exp6 variants we were able to get results for the pseudoautosomal region.