Question

Single rare variant approaches

0

Entering edit mode

6.1 years ago

Tash • 0

Hi all,

I am new to this forum - forgive me if this is a very basic question.

I am currently analysing a WES dataset, and have identified 4 rare variants of interest in the same gene (a total of 8 cases share mutations in this gene in a sample of 50 cases). Two of the variants have a MAF < 1%, while the other two are around 1.2%.

I have read that it may be appropriate to perform a burden association test in PLINK /SEQ for the rarer variants in this gene to obtain a score of 'mutational load' (I guess to increase power), but I am hoping to get some general guidance in what would be the best approach to analyse single rare variants?

Thanks!

next-gen • 3.8k views

ADD COMMENT • link updated 6.1 years ago by NB ▴ 960 • written 6.1 years ago by Tash • 0

5

Entering edit mode

6.1 years ago

WouterDeCoster 47k

SKAT and SKAT-O (in R) are two commonly used tools for rare variant association testing, using also more sophisticated models than simple burden test.

ADD COMMENT • link 6.1 years ago by WouterDeCoster 47k

0

Entering edit mode

Appreciate your reply.

Do you know of any good example data I could download to get a feel for how to perform SKAT in R? To me it is not completely straightforward what the input file should entail (again... newbie), but it might help to see something in front of me.

ADD REPLY • link 6.1 years ago by Tash • 0

2

Entering edit mode

I usually use plink to generate a .raw file:

plink --vcf yourvariants.vcf --out projectname
plink --hwe 0.001 --bfile projectname --max-maf 0.01 --recode A --out projectname

Make sure to use the right .fam file, fill in conditions (phenotype)

And then use the following Rscript, saved as e.g. SKAT.R:

Rscript SKAT.R projectname.raw

#!/usr/bin/Rscript

library('SKAT')

cat("WARNING: this script doesn't perform a sanity check on your data and therefore it's your own responsibility to provide correct input data.\n")
args = unlist(strsplit(commandArgs(trailingOnly = TRUE)," "))
basename = strsplit(basename(args), "\\.")[[1]][1]
dat <- read.table(args[1], header=TRUE, stringsAsFactors=FALSE) 
phen = ifelse(dat$PHENOTYPE == 2, 0, 1) #Convert affected = 2 to affected = 0
geno = as.matrix(dat[,-c(1:6)])
obj <- SKAT_Null_Model(phen~1, out_type="D")
skatT <- SKAT(geno, obj, method="optimal.adj")
cat(skatT$p.value)
cat("\n\n")

warnings = names(as.vector(warnings()))
wardf <- data.frame(V1=paste("warning", 1:length(warnings), ':', sep=""), V2=warnings)
output = rbind(
    data.frame(
        V1=c("individuals", "controls", "cases", paste("pval for rho=", skatT$param$rho, ":", sep=""), "test_p:"),
        V2=c(length(phen), table(phen)[[2]], table(phen)[[1]], skatT$param$p.val.each, skatT$p.value)),
    wardf)
write.table(output, paste(basename, "_SKAT_details.txt", sep=""), col.names=F, quote=F, row.names=F, sep="\t")

ADD REPLY • link 6.1 years ago by WouterDeCoster 47k

score 3 · Accepted Answer · 2018-02-28

3

Entering edit mode

6.1 years ago

NB ▴ 960

You can try Variant tools that includes SKAT as mentioned by @WouterDeCoster . These can take vcf files as input . There are tutorials on the website which can help find your way with these tests. otherwise if your comfortable to try R, have a look at SKAT and R

ADD COMMENT • link 6.1 years ago by NB ▴ 960

1

Entering edit mode

Oh yeah, that might be easier than going in R yourself and less painful than what I described to do manually.
But I like my analysis authentic, artisanal, GMO and gluten-free ;-)

ADD REPLY • link 6.1 years ago by WouterDeCoster 47k