Question

GWAS on CNV - looking for method

0

Entering edit mode

4.0 years ago

GWASG • 0

Hi everyone,

I'm looking for a GWA algorithm for copy number variation (CNV) data.

1) Data

A reference-based collection of several DNA segments (e.g. genes) that have different occurrences in my analyzed dataset (A. thaliana). I'm relatively open to data formats since I have all information needed to convert it. At the end its something like:

         seg1 seg2 seg3 
sample1   0    2    1
sample2   5    1    3

2) Knowlege

I've done GWAS in the past and was normally using GEMMA or EMMA. These algorithms are fast enough for my small (~100-1000) sample size and gave good results. GEMMA and other GWAS methods use the plink bed format which represents binary allele information.

3) Workaround

I'm aware that I could trick "normal" GWAS methods by just comparing the following:

0        vs #seg1>0
#seg<2   vs #seg>=2

OR

#seg==2  vs  #seg!=2

This would include comparing all possible combinations and I'm not sure if it would give me the right solution.

4) What I'm looking for:

GWAS algorithm which incorporates more than binary occurrence. I want to know if having 2 copies of a segment has a significant effect on the phenotype. Does anyone know a suitable method for this problem?

Hand in hand with this question: Why do we "ignore" alleles with low frequencies? Are these not important?

Thanks, Sebastian

GWAS CNV • 923 views

ADD COMMENT • link updated 4.0 years ago by chrchang523 10k • written 4.0 years ago by GWASG • 0

0

Entering edit mode

Are you using LMM? If you're not taking the kinship matrix into account (or willing not to), you can just treat it as a linear model and solve it in R.

GEMMA can accept bimbam file format which is basically flat file, you can specify 0/1/2 as genotype, I'm not sure if more than 2 is an option but maybe it's enough for your needs.

ADD REPLY • link 4.0 years ago by Asaf 10k

0

Entering edit mode

Hi, Thank you for your reply! Normally, I'm using LMM since the population structure has an effect in A. thaliana, but I will try this to see if it fixes the problem for now. Unfortunately, in most cases I'm having more than 3 possible outcomes in my CNV dataset...

ADD REPLY • link 4.0 years ago by GWASG • 0

score 2 · Answer 1 · 2020-05-06

2

Entering edit mode

4.0 years ago

chrchang523 10k

Quite a few GWAS software packages support dosages (decimal values like 0.1 and 1.8 permitted in the genotype matrix, rather than just {0, 1, 2}). So one option is to linearly transform your copy numbers to [0..2]: e.g. if your actual copy numbers are in [0..60], set the dosage to [copy number]/30. And you can try other transforms (quantile-based, etc.) when a linear transformation doesn't look appropriate for your data.

ADD COMMENT • link 4.0 years ago by chrchang523 10k

0

Entering edit mode

Thank you for your reply! This sounds like a great idea, do you have any good software packages in mind, which support dosages in genotype matrix and correct for population structure? I also like the idea of the different transformations, this makes it more flexible :)