Concordance analysis using statistics/graphs/figures
1
0
Entering edit mode
8.7 years ago
MAPK ★ 2.1k

Hi Guys,

I have sets of genotype data. The data are more than 99% (presumably) concordant. What would be the best way to represent them with proper statistics and figures? Here in the table below, I need to compare the concordance between the given genotypes (geno1 and geno2, geno3 and geno4, and geno5 and geno6). Here, geno1 and geno2 are 100% concordant, geno3 and geno4 have some genotypes that are not concordant and geno5 and geno6 have genotypes that are least concordant with each other. What would be the best way to analyse this data (assuming I have high school level of statistics knowledge)? Thank you!

geno1   geno2   geno3   geno4   geno5   geno6
0/1     0/1     0/0     0/0     1/1     1/0
0/0     0/0     0/1     0/1     0/0     0/1
0/1     0/1     0/0     0/0     0/1     0/1
0/0     0/0     0/0     0/0     1/1     1/1
0/0     0/0     0/1     0/1     0/0     0/0
0/1     0/1     0/1     0/0     0/1     0/0

analysis • 3.2k views
1
Entering edit mode

What's your purpose? Are you trying to compare whether if gene1 and gene2 are significantly more concordant than say, gene1 and gene5?

Or are you trying to get a concordant table? If so you just need to represent it by an n by n table:

        Gene1   Gene2   Gene3   Gene4   Gene5
Gene1   1       1       0.33    0.167   0.667
Gene2           1       0.33    0.167   0.667
Gene3                   1       0.833   0
Gene4                           1       0
Gene5                                   1

0
Entering edit mode

Yes, I am trying to compare whether gene1 and gene2 are significantly more concordant than say, gene5 and gene6. Basically I want to show the concordance level of all pair set in graphical form. Concordance between pair sets of geno1 and geno2, geno 3 and geno4 and geno5 and geno6.

0
Entering edit mode

Do you consider 0/1 and 1/0 as concordant?

0
Entering edit mode

No, it is not concordant. Thank you!

4
Entering edit mode
8.7 years ago
Sam ★ 4.7k

The easiest way to do will be as follow:

Assuming your information were stored in a text file called test.txt

Then you can:

data = read.table("test.txt", header=T)
concordant = matrix(NA, ncol(data), ncol(data))
colnames(concordant) = colnames(data)
rownames(concordant) = rownames(data)
for(i in 1:ncol(data)){
for(j in i:ncol(data)){
concordant[i,j] = sum(as.character(data[,i])==as.character(data[,j]))/ncol(data)
}}


If you have a large table and don't want to waste time by computing more of the concordance information, you can change the line

result = data.frame(A=character(ncol(data)/2), B=character(ncol(data)/2), Concordance=numeric(ncol(data)/2),stringsAsFactors=FALSE);
j = 1;
for(i in seq(1, ncol(data), 2)){
result[j,1] = colnames(data)[i];
result[j,2] = colnames(data)[i+1];
result[j,3] = sum(as.character(data[,i])==as.character(data[,i+1]))/ncol(data)
j= j+1;
}


However, if you consider 0/1 and 1/0 as the same, then you simple just change all the instance of 1/0 to 0/1 or the other way round

0
Entering edit mode

Thank you so much, Sam!

0
Entering edit mode

One quick question: How do I skip any column in one of the two column-sets if they are empty (as shown in table below)? I want to exclude them from the calculation. Thank you again for your help!

geno1   geno2   geno3   geno4   geno5   geno6
0/1     0/1     0/0     0/0     1/1     1/0
0/0     0/0     0/1             0/0     0/1
0/1     0/1     0/0     0/0     0/1     0/1
0/0             0/0     0/0     1/1
0/0     0/0     0/1     0/1     0/0     0/0
0/1     0/1     0/1     0/0     0/1     0/0