**0**wrote:

Hi everyone,

I'm looking for a GWA algorithm for copy number variation (CNV) data.

**1) Data**

A reference-based collection of several DNA segments (e.g. genes) that have different occurrences in my analyzed dataset (A. thaliana). I'm relatively open to data formats since I have all information needed to convert it. At the end its something like:

```
seg1 seg2 seg3
sample1 0 2 1
sample2 5 1 3
```

**2) Knowlege**

I've done GWAS in the past and was normally using GEMMA or EMMA. These algorithms are fast enough for my small (~100-1000) sample size and gave good results. GEMMA and other GWAS methods use the plink bed format which represents binary allele information.

**3) Workaround**

I'm aware that I could trick "normal" GWAS methods by just comparing the following:

```
0 vs #seg1>0
#seg<2 vs #seg>=2
OR
#seg==2 vs #seg!=2
```

This would include comparing all possible combinations and I'm not sure if it would give me the right solution.

**4) What I'm looking for:**

GWAS algorithm which incorporates more than binary occurrence. I want to know if having 2 copies of a segment has a significant effect on the phenotype. Does anyone know a suitable method for this problem?

Hand in hand with this question: Why do we "ignore" alleles with low frequencies? Are these not important?

Thanks, Sebastian

**7.3k**• written 5 months ago by GWASG •

**0**

Are you using LMM? If you're not taking the kinship matrix into account (or willing not to), you can just treat it as a linear model and solve it in R.

GEMMA can accept bimbam file format which is basically flat file, you can specify 0/1/2 as genotype, I'm not sure if more than 2 is an option but maybe it's enough for your needs.

8.4kHi, Thank you for your reply! Normally, I'm using LMM since the population structure has an effect in A. thaliana, but I will try this to see if it fixes the problem for now. Unfortunately, in most cases I'm having more than 3 possible outcomes in my CNV dataset...

0