I'm looking for a GWA algorithm for copy number variation (CNV) data.
A reference-based collection of several DNA segments (e.g. genes) that have different occurrences in my analyzed dataset (A. thaliana). I'm relatively open to data formats since I have all information needed to convert it. At the end its something like:
seg1 seg2 seg3 sample1 0 2 1 sample2 5 1 3
I've done GWAS in the past and was normally using GEMMA or EMMA. These algorithms are fast enough for my small (~100-1000) sample size and gave good results. GEMMA and other GWAS methods use the plink bed format which represents binary allele information.
I'm aware that I could trick "normal" GWAS methods by just comparing the following:
0 vs #seg1>0 #seg<2 vs #seg>=2 OR #seg==2 vs #seg!=2
This would include comparing all possible combinations and I'm not sure if it would give me the right solution.
4) What I'm looking for:
GWAS algorithm which incorporates more than binary occurrence. I want to know if having 2 copies of a segment has a significant effect on the phenotype. Does anyone know a suitable method for this problem?
Hand in hand with this question: Why do we "ignore" alleles with low frequencies? Are these not important?