I have variant data for a group of people. Some of those people have a disease. I'm trying to create a ranked list of genes for the disease by variant density ratio, for an alternate GSEA. To do this, I need to find the variant densities for the two groups first. There are currently three methods available to me to find variant density:
1) Count the total number of variants called in column 2 of the VCF, and divide it by the gene length, and multiply it by 1000.
2) "The average number of * variants per individual for each of the genes is computed by summing the number of * variants present in each individual for the gene and dividing by the total number of individuals in the population." Then I divide this by the length of the gene and multiply by 1000. The method is quoted from page 2 here.
3) Count the total number of alternate allele calls for a gene in every person. Count the total number of allele calls in every person and divide by the number of different variants found. This gives the average number of alleles available to make a genotype for the gene. Divide the total alternate allele call count by this number to get the average number of variants per allele. Multiply this by 2 to find average number of variants per person for a gene. Then divide this by the length of the gene and multiply by 1000. I tried to get the units right to confirm that this is valid, but it was very confusing. I'm getting variants^2/allele:
Say you have a gene, NEUROD1, with a length of 1,000bp. There are 3 samples and 3 variants were called.
Variant Sample1 Sample2 Sample3 AF AC #A A->C 1/1 0/1 ./. 0.75 3 4 C->G 0/0 0/1 1/1 0.5 3 6 A->AA 0/0 0/1 ./. 0.25 1 4
The total Alternate Allele call Count for NEUROD1 is 3+3+1=7.
The total number of allele uses for NEUROD1 is 4+6+4=14.
14 allele calls for the gene/3 variants for the gene = 4.67 allele calls/variant
7 variant calls/gene / 4.67 allele calls/variant = 1.5 variants^2/allele of a gene
I have no idea what variant squared is, but this is the average number of variants you will find in one allele of NEUROD1.
7 variant allele calls/gene / 14 allele calls for the gene = 0.5 variants/allele for the gene^2
This is the average number of times you will find one of the variants in one allele for the gene.
I'm completely lost on how to explain the square units. I'll assume it's cause we're dealing with a 2D table: vcf.
1.5 variants^2/allele of a gene * 2 alleles for a person/gene = 3 variants^2 for a person/gene^2
3 variants^2 for a person/gene^2 / 1000bp/gene = 0.003 variants^2 for a person/bp for gene
0.003 variants^2 for a person/bp for gene * 1000bp/kb = 3 variants^2 for a person/kb of gene