Question: statistical test to see if DE genes cluster around a chromosomal location?
gravatar for FLDal
4.1 years ago by
FLDal10 wrote:


I have a gene expression data set  that can be broken down into genes that are significantly differential expressed (DE) after a experimental condition, and those that are not. I'm interested in knowing if the genes that are significantly DE occur around a subset of chromosomal locations of interest more often that the non significantly DE genes.  I know I have to consider issues of  gene length and inter-gene gap length - however, at the moment, I want a first pass test just to see if significantly DE genes occur near these locations more frequently than the non-significantly DE genes. 

I have spent some time trying to figure out how to do this via a Chi-square permutation test. I've written R code that creates bins say, 100kb upstream and downstream from the locations, have created a frequency table of the significantly DE and non-significantly DE genes sorted into these bins. Because the proportion of non-significantly DE genes is larger than the significantly DE genes (I have about 100 significantly DE and about 1000 non-significantly DE genes), it was suggested to me that I randomly sample a 100 non-significantly DE genes, and run the Chi square test a 1000 times, randomly sampling the non-sig DE genes on each iteration. This is a bit deviation from the Chi square permutation test I am used to, which would randomly shuffle one variable on the contingency table to create a null distribution. Basically, I was told I should test only a subset of my data, and create the null based on random sampling of the complete set. 

I have many questions (based on lots of failed R code), but they all stem from this one main question - is this approach an appropriate test for this aim? 

Forgive me if this very basic. I am very, very new to genomics and have little background in the area. 


ADD COMMENTlink modified 4.1 years ago by mikhail.shugay3.3k • written 4.1 years ago by FLDal10

How about this:

1. Generate a gene set from the target location that you are interested

2. Perform a hypergeometric test on that area looking for enrichment of DE genes

3. You have two choice here:

a) Perform permutation by permuting the genes within this set (e.g. given you have N genes in this set, randomly sample N genes from your data and perform the hypergeometric test)

b) Perform permutation by permuting the location (I am not sure if this is the right way to do though)

4. Then you will have the a number of p-value which you can compare with your original p-value to get an empirical p-value

ADD REPLYlink written 4.1 years ago by Sam2.2k

I'd just put together a gene neighbour network in cytoscape and use jActiveModules if I wanted a rough and ready answer.

Are you sure your candidate gene sets aren't correlated through some other technical artifact (shared probes / sequence identity)?

ADD REPLYlink written 4.1 years ago by russhh4.2k
gravatar for mark.ziemann
4.1 years ago by
mark.ziemann1.1k wrote:

GSEA lets you use custom pathways or gene sets to test for trends in gene expression. You can make the custom gene sets with a combination of bedtools makewindows and intersectBed with the GTF. That way you can see the statistical significance of your region of interest relative to all other regions in the genome.

Cheers, Mark from GenomeSpot


ADD COMMENTlink written 4.1 years ago by mark.ziemann1.1k
gravatar for mikhail.shugay
4.1 years ago by
Czech Republic, Brno, CEITEC
mikhail.shugay3.3k wrote:

If you have a set of chromosomal positions of interest, you can compute the distribution of distances from DE and background gene sets to the nearest position of interest and use something like Kolmogorov-Smirnov test to see if those distributions are significantly different.

ADD COMMENTlink written 4.1 years ago by mikhail.shugay3.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 811 users visited in the last hour