Question: GWAS on CNV - looking for method
0
gravatar for GWASG
5 months ago by
GWASG0
GWASG0 wrote:

Hi everyone,

I'm looking for a GWA algorithm for copy number variation (CNV) data.

1) Data

A reference-based collection of several DNA segments (e.g. genes) that have different occurrences in my analyzed dataset (A. thaliana). I'm relatively open to data formats since I have all information needed to convert it. At the end its something like:

         seg1 seg2 seg3 
sample1   0    2    1
sample2   5    1    3

2) Knowlege

I've done GWAS in the past and was normally using GEMMA or EMMA. These algorithms are fast enough for my small (~100-1000) sample size and gave good results. GEMMA and other GWAS methods use the plink bed format which represents binary allele information.

3) Workaround

I'm aware that I could trick "normal" GWAS methods by just comparing the following:

0        vs #seg1>0
#seg<2   vs #seg>=2

OR

#seg==2  vs  #seg!=2

This would include comparing all possible combinations and I'm not sure if it would give me the right solution.

4) What I'm looking for:

GWAS algorithm which incorporates more than binary occurrence. I want to know if having 2 copies of a segment has a significant effect on the phenotype. Does anyone know a suitable method for this problem?

Hand in hand with this question: Why do we "ignore" alleles with low frequencies? Are these not important?

Thanks, Sebastian

cnv gwas • 170 views
ADD COMMENTlink modified 5 months ago by chrchang5237.3k • written 5 months ago by GWASG0

Are you using LMM? If you're not taking the kinship matrix into account (or willing not to), you can just treat it as a linear model and solve it in R.

GEMMA can accept bimbam file format which is basically flat file, you can specify 0/1/2 as genotype, I'm not sure if more than 2 is an option but maybe it's enough for your needs.

ADD REPLYlink written 5 months ago by Asaf8.4k

Hi, Thank you for your reply! Normally, I'm using LMM since the population structure has an effect in A. thaliana, but I will try this to see if it fixes the problem for now. Unfortunately, in most cases I'm having more than 3 possible outcomes in my CNV dataset...

ADD REPLYlink written 5 months ago by GWASG0
2
gravatar for chrchang523
5 months ago by
chrchang5237.3k
United States
chrchang5237.3k wrote:

Quite a few GWAS software packages support dosages (decimal values like 0.1 and 1.8 permitted in the genotype matrix, rather than just {0, 1, 2}). So one option is to linearly transform your copy numbers to [0..2]: e.g. if your actual copy numbers are in [0..60], set the dosage to [copy number]/30. And you can try other transforms (quantile-based, etc.) when a linear transformation doesn't look appropriate for your data.

ADD COMMENTlink modified 5 months ago • written 5 months ago by chrchang5237.3k

Thank you for your reply! This sounds like a great idea, do you have any good software packages in mind, which support dosages in genotype matrix and correct for population structure? I also like the idea of the different transformations, this makes it more flexible :)

ADD REPLYlink written 5 months ago by GWASG0

Actually GEMMA can accept this input

ADD REPLYlink written 5 months ago by Asaf8.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1781 users visited in the last hour