Question: What can be causing my very odd data?
gravatar for HumeMarx
3.6 years ago by
United Kingdom
HumeMarx20 wrote:

Hi all 

I am having a lot of difficulty with a set of case/control exome data (using Plink2 as my main analysis tool). 

I have a lot of heterozygous haploid genotypes and nonmale nonmissing Y chromosome markers. A large proportion of the samples appear not to be Caucasian from pca analysis, even though the curators assure us they are all Caucasian. Also a very large number of samples seem to be very closely related to each other (pi-hat estimates way above 0.125).

On top of this a few sex-fails have been detected (removed from analysis before population stratification and relatedness checks). 

Is it likely that all these issues are caused by missing SNPs? Over 2/3 of the available SNPs had to be removed from analysis as they had missingness values above 15%. 

My personal feeling is that I can't really trust this set of data based on all these things that are going wrong! There has to be a fundamental reason why every step in this analysis is causing so much spurious results! 


Any help is immensely appreciated! 

ADD COMMENTlink written 3.6 years ago by HumeMarx20

It's horrible! 

Let's start from the easiest thing: "I have a lot of heterozygous haploid genotypes and nonmale nonmissing Y chromosome markers". I imagine nonmale nonmissing Y markers means you have females with Y markers, right? So maybe your samples are not what you are thinking. Try to calculate how many males and how many females you have in your sample and see if the numbers make sense.

However, I wouldn't be happy with the missingness levels you are talking about. What happens if you try to slightly increase the threshold for removing a SNP, i.e. 25% of missingness? Does this rescue a lot of SNPs? Maybe some sequencing run was very bad?

ADD REPLYlink written 3.6 years ago by Fabio Marroni2.3k


Yes the data is awful. Exactly, I have females with Y markers. That suggests a genotyping issue to me, am I right?!

Overall there are two males in the pedigree file that actually appear to be female based on the SNP data. Quite a few others that are female but have Y markers present! 

By increasing the threshold to 20% I save a lot of SNPs. But the problem is I can't find a single person who would condone this increase. All the papers I have read specify ideally 10% but no one is keen to go above 15%! 

What I am also struggling to find out is if the heterozygous haploids have a REAL biological/genetical meaning?



ADD REPLYlink written 3.5 years ago by HumeMarx20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1647 users visited in the last hour