Question: GWAS on CNV - looking for method
gravatar for GWASG
5 months ago by
GWASG0 wrote:

Hi everyone,

I'm looking for a GWA algorithm for copy number variation (CNV) data.

1) Data

A reference-based collection of several DNA segments (e.g. genes) that have different occurrences in my analyzed dataset (A. thaliana). I'm relatively open to data formats since I have all information needed to convert it. At the end its something like:

         seg1 seg2 seg3 
sample1   0    2    1
sample2   5    1    3

2) Knowlege

I've done GWAS in the past and was normally using GEMMA or EMMA. These algorithms are fast enough for my small (~100-1000) sample size and gave good results. GEMMA and other GWAS methods use the plink bed format which represents binary allele information.

3) Workaround

I'm aware that I could trick "normal" GWAS methods by just comparing the following:

0        vs #seg1>0
#seg<2   vs #seg>=2


#seg==2  vs  #seg!=2

This would include comparing all possible combinations and I'm not sure if it would give me the right solution.

4) What I'm looking for:

GWAS algorithm which incorporates more than binary occurrence. I want to know if having 2 copies of a segment has a significant effect on the phenotype. Does anyone know a suitable method for this problem?

Hand in hand with this question: Why do we "ignore" alleles with low frequencies? Are these not important?

Thanks, Sebastian

cnv gwas • 170 views
ADD COMMENTlink modified 5 months ago by chrchang5237.3k • written 5 months ago by GWASG0

Are you using LMM? If you're not taking the kinship matrix into account (or willing not to), you can just treat it as a linear model and solve it in R.

GEMMA can accept bimbam file format which is basically flat file, you can specify 0/1/2 as genotype, I'm not sure if more than 2 is an option but maybe it's enough for your needs.

ADD REPLYlink written 5 months ago by Asaf8.4k

Hi, Thank you for your reply! Normally, I'm using LMM since the population structure has an effect in A. thaliana, but I will try this to see if it fixes the problem for now. Unfortunately, in most cases I'm having more than 3 possible outcomes in my CNV dataset...

ADD REPLYlink written 5 months ago by GWASG0
gravatar for chrchang523
5 months ago by
United States
chrchang5237.3k wrote:

Quite a few GWAS software packages support dosages (decimal values like 0.1 and 1.8 permitted in the genotype matrix, rather than just {0, 1, 2}). So one option is to linearly transform your copy numbers to [0..2]: e.g. if your actual copy numbers are in [0..60], set the dosage to [copy number]/30. And you can try other transforms (quantile-based, etc.) when a linear transformation doesn't look appropriate for your data.

ADD COMMENTlink modified 5 months ago • written 5 months ago by chrchang5237.3k

Thank you for your reply! This sounds like a great idea, do you have any good software packages in mind, which support dosages in genotype matrix and correct for population structure? I also like the idea of the different transformations, this makes it more flexible :)

ADD REPLYlink written 5 months ago by GWASG0

Actually GEMMA can accept this input

ADD REPLYlink written 5 months ago by Asaf8.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1781 users visited in the last hour