GWAS on CNV - looking for method
1
0
Entering edit mode
4.0 years ago
GWASG • 0

Hi everyone,

I'm looking for a GWA algorithm for copy number variation (CNV) data.

1) Data

A reference-based collection of several DNA segments (e.g. genes) that have different occurrences in my analyzed dataset (A. thaliana). I'm relatively open to data formats since I have all information needed to convert it. At the end its something like:

         seg1 seg2 seg3 
sample1   0    2    1
sample2   5    1    3

2) Knowlege

I've done GWAS in the past and was normally using GEMMA or EMMA. These algorithms are fast enough for my small (~100-1000) sample size and gave good results. GEMMA and other GWAS methods use the plink bed format which represents binary allele information.

3) Workaround

I'm aware that I could trick "normal" GWAS methods by just comparing the following:

0        vs #seg1>0
#seg<2   vs #seg>=2

OR

#seg==2  vs  #seg!=2

This would include comparing all possible combinations and I'm not sure if it would give me the right solution.

4) What I'm looking for:

GWAS algorithm which incorporates more than binary occurrence. I want to know if having 2 copies of a segment has a significant effect on the phenotype. Does anyone know a suitable method for this problem?

Hand in hand with this question: Why do we "ignore" alleles with low frequencies? Are these not important?

Thanks, Sebastian

GWAS CNV • 923 views
ADD COMMENT
0
Entering edit mode

Are you using LMM? If you're not taking the kinship matrix into account (or willing not to), you can just treat it as a linear model and solve it in R.

GEMMA can accept bimbam file format which is basically flat file, you can specify 0/1/2 as genotype, I'm not sure if more than 2 is an option but maybe it's enough for your needs.

ADD REPLY
0
Entering edit mode

Hi, Thank you for your reply! Normally, I'm using LMM since the population structure has an effect in A. thaliana, but I will try this to see if it fixes the problem for now. Unfortunately, in most cases I'm having more than 3 possible outcomes in my CNV dataset...

ADD REPLY
2
Entering edit mode
4.0 years ago

Quite a few GWAS software packages support dosages (decimal values like 0.1 and 1.8 permitted in the genotype matrix, rather than just {0, 1, 2}). So one option is to linearly transform your copy numbers to [0..2]: e.g. if your actual copy numbers are in [0..60], set the dosage to [copy number]/30. And you can try other transforms (quantile-based, etc.) when a linear transformation doesn't look appropriate for your data.

ADD COMMENT
0
Entering edit mode

Thank you for your reply! This sounds like a great idea, do you have any good software packages in mind, which support dosages in genotype matrix and correct for population structure? I also like the idea of the different transformations, this makes it more flexible :)

ADD REPLY
0
Entering edit mode

Actually GEMMA can accept this input

ADD REPLY

Login before adding your answer.

Traffic: 2976 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6