Question: statistical tests to show the specificity of a phenomenon (eg increase in H3K27me3 mark)
0
gravatar for Bogdan
2.7 years ago by
Bogdan880
Palo Alto, CA, USA
Bogdan880 wrote:

Dear all,

i have a more general question (anchored in genomics and related to ChIP-seq) regarding the statistical tests to show the specificity of phenomenon :

let's consider an example: someone did a ChIP_seq for H3K27me3, and wants to show that H3K27me3 mark increases only on the genes involved in autophagy, after cell treatment ...

what type of analysis would you recommend in order to show that the phenomenon (ie increase in H3K27me3) is specific to a set of genes (ie autophagy genes) :

A -- taking random sets of non-autophagy genes (practically, the rest of the genes in the genome) -- and using parametric and non-parametric tests when comparing SET 1 (autophagy genes) with SET 2 (non-autophagy genes)

or

B -- using hypergeometric / fisher-tests on a matrix (autophagy/no-autophagy genes vs increase/no-increase in H3K27me3) ?

thanks a lot, and happy weekend ;) !

bogdan

chip-seq • 1.0k views
ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Bogdan880
4
gravatar for Jean-Karim Heriche
2.7 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche21k wrote:

This looks like a standard enrichment analysis: are autophagy genes enriched in the set of genes showing an increase in H3K27me3 ? So go for option B. I am not sure what option A would do, it looks like a bagging approach to achieve the same thing.

ADD COMMENTlink written 2.7 years ago by Jean-Karim Heriche21k

Thank you Jean for your comments. On a side note, I was just thinking, as an alternative to enrichment tests, could someone just use the following procedure :

A-- take the SET of genes with a specific effect (in this case, H3K27me3 increase on autophagy genes)

B-- take a few random SETS of genes

C-- make the boxplots of A vs B, and if by t-tests (or wilcoxon.test test) the difference is statistically significant,

would this support the hypothesis that "the H3K27me3 increase on autophagy genes" is not random ?

ADD REPLYlink written 2.7 years ago by Bogdan880

You're dealing with counts here and the question is about enrichment so the standard way to answer it is with Fisher's exact test (or the Chi-squared test). If you're looking for a parametric alternative, you could formulate the question in terms of difference between proportions to make it more obvious: is the fraction of methylated genes in the autophagy set different from the fraction of methylated genes in the other genes ? This can be tested in a parametric way using a two-proportion Z-test. However this is equivalent to the Chi-squared or Fisher's tests with the added assumption that the binomial distribution can be approximated by a normal distribution. Note that actually, the tests are for equality of proportions (i.e. equality is the null hypothesis of the tests). What I don't get is why you would take random samples of the genes.

ADD REPLYlink written 2.7 years ago by Jean-Karim Heriche21k
3
gravatar for zjhzwang
2.7 years ago by
zjhzwang180
zjhzwang180 wrote:

Maybe you can use Chi-square test, if you can get a table like this:

match gene numbers not match gene numbers
autophagy cells n m
non-autophagy cells q k

Then you can get a P-value return by Chi-square test, and you can verfify whether your hypothesis is right or not.

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by zjhzwang180
0
gravatar for Bogdan
2.7 years ago by
Bogdan880
Palo Alto, CA, USA
Bogdan880 wrote:

Dear Jean, thank you for your comments and suggestions. Please could you detail a bit more on how one would apply bagging/bootstrapping to this type of problems (the questions comes from a non-expert in machine learning ;). thanks !

ADD COMMENTlink written 2.7 years ago by Bogdan880
1

Please use the 'add comment' button to reply to an answer. This keeps things organized.

I don't see why you would need any bagging or bootstrapping method here. Bagging is typically used with supervised learning to minimize variance of predictions and consists in generating multiple training sets by sampling with replacement from the training data, training a model for each training set then combining the models. Bootstrapping consists in sampling the data with replacement to estimate some statistics.

ADD REPLYlink written 2.7 years ago by Jean-Karim Heriche21k

Thank you Jean for your comments. On a side note, I was just thinking, as an alternative to enrichment tests, could someone just use the following procedure :

A-- take the SET of genes with a specific effect (in this case, H3K27me3 increase on autophagy genes) B-- take a few random SETS of genes C-- make the boxplots of A vs B, and if by t-tests (or wilcoxon.test test) the difference is statistically significant,

would this support the hypothesis that "the H3K27me3 increase on autophagy genes" is not random ?

ADD REPLYlink written 2.7 years ago by Bogdan880
0
gravatar for Bogdan
2.7 years ago by
Bogdan880
Palo Alto, CA, USA
Bogdan880 wrote:

yes, thank you, I have been using FISHER exact tests and CHI-SQUARE tests.

in addition, in order to have quantitative estimation, we used to make the BOXPLOTS of the histone levels in each of these categories.

it happens that sometime, although there is enrichment (based on chi.square test), the quantitative representation in BOXPLOTs does not show super-strong differences, and subsequently, I was wondering :

shall I add more to the analysis ? i.e. randomization tests ? permutation tests ? taking random sets of genes ? anything else ? thanks !

ADD COMMENTlink written 2.7 years ago by Bogdan880
1

Maybe you can do some random tests, make sure the total numbers of autophagy cells' genes and non-autophagy cells' genes never changed, then let n,m,q,k randomly changed, for example 1000 times, and compare the initial P-value with the random result.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by zjhzwang180

Thanks, and please, could you let me know, is there any R package where the randomization procedure is implemented.

ADD REPLYlink written 2.7 years ago by Bogdan880

If total numbers of autophagy cells' genes is A, and non- total numbers of autophagy cells' gene is B, than you can set n<-sample(1,seq(1,A)) and m<- A-n .

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by zjhzwang180

But I think P-value returned by chi-square test is enough, if you can post you boxplot, maybe someone can answer why it is not different strongly.

ADD REPLYlink written 2.7 years ago by zjhzwang180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2479 users visited in the last hour