have question on FREEMIX parameter I have this from you Hyun Min Kang
"The key idea of FREEMIX estimate is to use excessive heterozygosity to estimate the level of contamination. Especially for common SNPs, you will observe higher fraction of heterozygous alleles than 2p(1-p), and it turns out that you can quantify the contamination very well if you know the population allele frequency already. If you do not have accurate population allele frequency information, than it would be harder to estimate FREEMIX parameters using verifyBamID."
My question is
- if you don't give input vcf file then how does it estimate population allele frequency to measure the heterozygosity?
1a. does verifyBAMID uses only BAM file to estimate contamination using sequence only method.
FREEMIX values can vary from 0-0.5 because the model assumes contamination as a mixture of two samples.
Is there any way I can determine gender mixing happened during sequencing? I feel the total number snps from chrX and ChrY is not enough to get good estimation of freemix parameter.
second question is on CHIPMIX
My understanding is CHIPMIX comes from sequence + array method. It uses Allele frequency from the input vcf file. Is that true?
1 lets say if the sample 1 is contaminated with 50% of sample 2. how does chipmix would look like?