hi, I have data that looks like this: 3-column SNPs their gene based on Annovar and a p-value for every SNP. What I would like is to aggregate the p values for every gene.
snps <- data.frame( snp_id = c("rs1", "rs2", "rs3", "rs4", "rs5", "rs6", "rs7", "rs8"), Gene.refGene_ANNOVAR = c("gene1", "gene1", "gene1", "gene1", "gene2", "gene2", "gene2", "gene2"), p.value = c(0.7703884, 0.9648540, 0.9648540, 0.9648540, 0.54, 0.03, 0.03, 0.8) )
above an example of the data. I read about the SKAT method -> https://www.hsph.harvard.edu/skat/ and figured it might do the work. I read about the package here:https://rdrr.io/cran/SKAT/man/SKAT.html tried to implement it on my data, but got lost as to how to perform it correctly:
gene_pvals <- aggregate(p.value ~ Gene.refGene_ANNOVAR, data = df, FUN = function(x) SKAT::SKAT_Null(x)$p.value)
it doesn't work and returns errors. I would be happy if you could share your knowledge in this situation. I don't have information about the correlations between the SNPs but I know that they are correlated, do I have enough data to complete the SKAT method?
I have no sense of whether SKAT is the right tool for your analysis, but some insight into your errors: did you mean
data = snpsinstead on
data = df? Also,
SKAT_Null()does not appear to be a valid function in SKAT v2.2.5. Did you mean to call
would SAIGE-GENE+ be any help here ?
acvill can you sujjest maybe pther tools , methods for this ? ( also for every SNP i have data how many patients had the hetro/homozygous encodein (0,1,2). THANKS