Question

statistical tests to show the specificity of a phenomenon (eg increase in H3K27me3 mark)

0

Entering edit mode

8.4 years ago

Bogdan ★ 1.4k

Dear all,

i have a more general question (anchored in genomics and related to ChIP-seq) regarding the statistical tests to show the specificity of phenomenon :

let's consider an example: someone did a ChIP_seq for H3K27me3, and wants to show that H3K27me3 mark increases only on the genes involved in autophagy, after cell treatment ...

what type of analysis would you recommend in order to show that the phenomenon (ie increase in H3K27me3) is specific to a set of genes (ie autophagy genes) :

A -- taking random sets of non-autophagy genes (practically, the rest of the genes in the genome) -- and using parametric and non-parametric tests when comparing SET 1 (autophagy genes) with SET 2 (non-autophagy genes)

or

B -- using hypergeometric / fisher-tests on a matrix (autophagy/no-autophagy genes vs increase/no-increase in H3K27me3) ?

thanks a lot, and happy weekend ;) !

bogdan

chip-seq • 2.8k views

ADD COMMENT • link 8.4 years ago by Bogdan ★ 1.4k

0

Entering edit mode

8.4 years ago

Bogdan ★ 1.4k

Dear Jean, thank you for your comments and suggestions. Please could you detail a bit more on how one would apply bagging/bootstrapping to this type of problems (the questions comes from a non-expert in machine learning ;). thanks !

ADD COMMENT • link 8.4 years ago by Bogdan ★ 1.4k

1

Entering edit mode

Please use the 'add comment' button to reply to an answer. This keeps things organized.

I don't see why you would need any bagging or bootstrapping method here. Bagging is typically used with supervised learning to minimize variance of predictions and consists in generating multiple training sets by sampling with replacement from the training data, training a model for each training set then combining the models. Bootstrapping consists in sampling the data with replacement to estimate some statistics.

ADD REPLY • link 8.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thank you Jean for your comments. On a side note, I was just thinking, as an alternative to enrichment tests, could someone just use the following procedure :

A-- take the SET of genes with a specific effect (in this case, H3K27me3 increase on autophagy genes) B-- take a few random SETS of genes C-- make the boxplots of A vs B, and if by t-tests (or wilcoxon.test test) the difference is statistically significant,

would this support the hypothesis that "the H3K27me3 increase on autophagy genes" is not random ?

ADD REPLY • link 8.4 years ago by Bogdan ★ 1.4k

0

Entering edit mode

8.4 years ago

Bogdan ★ 1.4k

yes, thank you, I have been using FISHER exact tests and CHI-SQUARE tests.

in addition, in order to have quantitative estimation, we used to make the BOXPLOTS of the histone levels in each of these categories.

it happens that sometime, although there is enrichment (based on chi.square test), the quantitative representation in BOXPLOTs does not show super-strong differences, and subsequently, I was wondering :

shall I add more to the analysis ? i.e. randomization tests ? permutation tests ? taking random sets of genes ? anything else ? thanks !

ADD COMMENT • link 8.4 years ago by Bogdan ★ 1.4k

1

Entering edit mode

Maybe you can do some random tests, make sure the total numbers of autophagy cells' genes and non-autophagy cells' genes never changed, then let n,m,q,k randomly changed, for example 1000 times, and compare the initial P-value with the random result.

ADD REPLY • link 8.4 years ago by zjhzwang ▴ 180

0

Entering edit mode

Thanks, and please, could you let me know, is there any R package where the randomization procedure is implemented.

ADD REPLY • link 8.4 years ago by Bogdan ★ 1.4k

0

Entering edit mode

If total numbers of autophagy cells' genes is A, and non- total numbers of autophagy cells' gene is B, than you can set n<-sample(1,seq(1,A)) and m<- A-n .

ADD REPLY • link 8.4 years ago by zjhzwang ▴ 180

0

Entering edit mode

But I think P-value returned by chi-square test is enough, if you can post you boxplot, maybe someone can answer why it is not different strongly.

ADD REPLY • link 8.4 years ago by zjhzwang ▴ 180

score 4 · Accepted Answer · 2017-02-26

4

Entering edit mode

8.4 years ago

Jean-Karim Heriche 27k

This looks like a standard enrichment analysis: are autophagy genes enriched in the set of genes showing an increase in H3K27me3 ? So go for option B. I am not sure what option A would do, it looks like a bagging approach to achieve the same thing.

ADD COMMENT • link 8.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thank you Jean for your comments. On a side note, I was just thinking, as an alternative to enrichment tests, could someone just use the following procedure :

A-- take the SET of genes with a specific effect (in this case, H3K27me3 increase on autophagy genes)

B-- take a few random SETS of genes

C-- make the boxplots of A vs B, and if by t-tests (or wilcoxon.test test) the difference is statistically significant,

would this support the hypothesis that "the H3K27me3 increase on autophagy genes" is not random ?

ADD REPLY • link 8.4 years ago by Bogdan ★ 1.4k

0

Entering edit mode

You're dealing with counts here and the question is about enrichment so the standard way to answer it is with Fisher's exact test (or the Chi-squared test). If you're looking for a parametric alternative, you could formulate the question in terms of difference between proportions to make it more obvious: is the fraction of methylated genes in the autophagy set different from the fraction of methylated genes in the other genes ? This can be tested in a parametric way using a two-proportion Z-test. However this is equivalent to the Chi-squared or Fisher's tests with the added assumption that the binomial distribution can be approximated by a normal distribution. Note that actually, the tests are for equality of proportions (i.e. equality is the null hypothesis of the tests). What I don't get is why you would take random samples of the genes.

ADD REPLY • link 8.4 years ago by Jean-Karim Heriche 27k

score 3 · Accepted Answer · 2017-02-27

Maybe you can use Chi-square test, if you can get a table like this:

	match gene numbers	not match gene numbers
autophagy cells	n	m
non-autophagy cells	q	k

Then you can get a P-value return by Chi-square test, and you can verfify whether your hypothesis is right or not.