Question: Permutation test with gene size and GC contents
gravatar for omicj
3.7 years ago by
San Diego
omicj60 wrote:


I am trying to do permutation test for comparing number of LoF mutations in cancer patients. 

Haven't done this before based on gene size and GC content. Do you have any statistical tutorial for this? or code?



ADD COMMENTlink modified 3.7 years ago by bernatgel1.9k • written 3.7 years ago by omicj60

Omicj, I can help you, however there some unclear things about what you want to do. What is your input? Mutations - are they SNPs (bed file with positions)? 

Also, see this answer on "how to do permutations" by Aaronquinlan - A: Picking Random Genomic Positions


ADD REPLYlink written 3.7 years ago by PoGibas4.8k
gravatar for bernatgel
3.7 years ago by
Barcelona, Spain
bernatgel1.9k wrote:

In R you can use the package regioneR available in Bioconductor. It takes care of the randomization process and the statistics and you can customize it to evaluate different features.

If I understand correctly your question, you want to test if the proportion of LoF variants you have in you sample is higher that what you would expect by chance, is'nt it? If so, you could try with something like this: create a number of random variant sets based on yours, evaluate how many LoF variant you detect, see if the original set has more LoF variants than expected.

I would do something like this:

1 - Implement a function counting the number of LoF variants given a GRanges with the reference and alternative alleles. You can use the predictCoding function in VariantAnnotation, for example, with this signature:

numberOfLoF <- function(A, ...) {
  varAllele <- DNAStringSet(as.character(A$alt))
  mcols(A) <- NULL 
  cod <- predictCoding(query=A, subject=txdb, seqSource=Hsapiens, varAllele=varAllele)  
  #Note1: the same variant can be counted more than once if it affects more than one transcript
  #Note2: What you consider a LoF can be different than that
  numLoF <- length(which(cod$CONSEQUENCE %in% c("nonsynonymous", "frameshift", "nonsense")))

2 - Implement a randomization that places the variants randomly over the genome and assigns the same nucleotide changes you see in your variants:

randomizeVariants <- function(A, ...) {
  randA <- randomizeRegions(A, ...)
  mcols(randA) <- mcols(A)

3 - Create a mask to tell the randomization function where the random variants can be placed. If this is from an exome sequencing experiment, I would mask out any part of the genome not targeted by the exome.

exome.targets <- toGRanges("path/to/exome.bed")
whole.genome <- filterChromosomes(getGenome(genome="hg19"), chr.type="canonical")
mask <- subtractRegions(whole.genome, exome.targets)

4 - And finally, perform a permutation test with all of this

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

#Define here the functions and mask above

#Permutation test

pt <- permTest(A=vars, randomize.function=randomizeVariants, evaluate.function=numberOfLoF,
         mask=mask, genome="hg19", ntimes=50)


This may take some time to run, depending on the number of variants you have and the ntimes, but it will perform the whole permutation test and give you a p-value, a z-score and a plot of the results of the permutation tests.

You can find more information about regioneR in the package vignette.


ADD COMMENTlink written 3.7 years ago by bernatgel1.9k
gravatar for F
3.7 years ago by
F3.4k wrote:



ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by F3.4k

Sarah, what does factorbook/jaspar have to do with permutations for mutation enrichement? 

ADD REPLYlink written 3.7 years ago by PoGibas4.8k

factorbook does with transcription factor dynamic

ADD REPLYlink written 3.7 years ago by F3.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1139 users visited in the last hour