**190**wrote:

Hi, everyone，there are many methods to see whether two sets of data are different. For example, if we have three replicates of one gene between A sample and B sample, we could make a t-test to get a p-value, if p-value is below 0.01, we say these two groups are significantly different. However, if p-value is above 0.05, to what extent can we say the two groups are similar?

For example(in R code):

```
data<-data.frame(x=c(178,0,0),y=c(1680,500,1400))
t.test(data$x,data$y)
```

we get a p-value to be 0.082, can we say x and y are similar? Another example(in R code):

```
data<-data.frame(x=c(1320,1200,1320),y=c(1280,1250,1300))
t.test(data$x,data$y)
```

This time, we get a p-value of 0.943, can we say x and y are similar?

Specifically, when we evaluate the gene effect on rice yield, we have yield data of three over-expression line and three wild type(same genome), we make a t-test or some other method, when the p-value is above 0.05, can we say this gene does not affect the grain yield? Or in arabidopsis, if we get a large p-value of tiller number between mutant and wild type, can we say this gene have no effect on tiller number phenotype? Taken first example into consideration, we get a large p-value, but it seems hard to say that x and y are similar, we just need more replicates to decide it. But in the second example, we have confidence to say that x and y are similar, right?

So can we set a threshold of 0.7, if p-value is larger than 0.7, we say two samples are similar? Or is there other methods to evaluate the similarity of two sample? How similar two sample are?

Thank you very much for your attention!

Aifu.

This sounds like p-hacking...

If your p-value is 0.7, you can't conclude anything other than they don't meet your significance level for whatever test you're interested in..

15kOK, so p-value cannot be used to tell whether two samples are similar?

190Well yes, it can, but not in the way you're describing.

You're looking more for correlation by the sounds of it. Statistically testing and fudging the p-value isn't really the same thing.

15kNice to hear that, so in the two R code example I show in the question, how to test they are similar or not? Could you please show how in detail? Thank you!

190In the example you show, there are too few data points to statistically do

anything. AFAIK, t-tests~~need~~are statistically powerful with at least ~10 data points, 25-30 is much better.25kYes, the data points are few to fit for t-test, but if we have 25-30 data points, and still get a p-value as large as 0.94, so what conclusion can we draw? It seems to me now that we could not draw any conclusion. Only if p-value is smaller than 0.05 can we draw a conclusion that the gene is very likely be different between two samples.

190You are correct. p-value ~ 0.1 is no different from p-value 0.9 in terms of how it helps your hypothesis.

25kA p-value of 0.05 or less simply says there is a 1 in 20 chance that you rejecting the null hypothesis was the wrong thing to do.

There is nothing theoretically stopping you from saying that you want to use p < 0.1 -- but what that means is you are happy with there being a 10% chance your conclusion

is wrong.p < 0.05 is arbitrary. Many think it should be at least p < 0.01 (a 1 in 100 chance of being wrong). But if you tried to publish p < 0.1, let alone p < 0.7, you would probably be in for a rough ride.

If you were a physicist, you'd be operating on 5-sigma, which is a 1 in 3.5 million chance of being wrong.

15kSent me down a rabbit hole where I discovered a nice article: https://blogs.scientificamerican.com/observations/five-sigmawhats-that/

25kThink I read that one back when the Higgs was announced

15kWell, thank you Ram and healey for your kindly reply. I think we could leave aside the discussion on p-value, and come back to the main point of my question: how to evaluate the similarity of genes between two sample.

For example, if you sow many rice seeds of one variety in two farmland, since the seeds are from one variety, so you're sure the two samples are same. Now, if I don't know they are the same variety, and I measure the height of 100 mature rice per sample(you can set the number to be larger)，so how can I draw the conclusion with the data that the two samples are come from the same variety?

Now the p-value is not suitable for this situation.

190Unless you have prior sample data for heights of various rice varieties for the two farm plots (assuming the growing conditions were roughly equal when all data was collected) this would be an impossible task. If the only measurement you have is height of plants.

76kOK, let's see it deeper. If you've sowed 20 varieties(A, B, C, D, ......) of rice in this two farm plots before(that is 190 times, need many many years, never mind it's just an assumption). This year, you sow two As in the two plots. I measure height, heading date, tiller number, and so on. So could it be possible to draw a conclusion that they are from the same variety?

Because the fact is that two plots you sow are the same, I think there should be some method to find this, right?

190If your

a prioriknowledge is that Rice A grows to mean 100cm (lets say), and your test case rice grows to a distribution of 100+-12, then thatisa task for a p-value/t-test or whatever, because you would be trying to test the null hypothesis that the before and after populations originate from the same distribution.If you don’t have any ‘control’ data to compare to, the best you can do with your 2 sample data is correlate them I think - but I’m no statistician. E.g. if observations A and observations B give a correlation coefficient of >0.7 (arbitrarily), you’d have some justification for saying they’re similar.

15kI agree with the correlation coefficient method. But for the p-value/t-test, even if we accept the null hypothesis, we could just say we could not identify the difference between these two rice samples, we could not say they are from the same variety, right?

190Yes. You would phrase it that there is no difference between the populations (accounting for variance etc). In practice most people would then accept that this is an

indicationthat they may be of the same variety when you interpret the result.What you would then do is go and test something else (like yield, or colouration or something), and repeat the process for a new observation. If, after many different observations, you consistently find no significant differences between your reference and the sample, you can become increasingly confident that they are the same variety. It is never 100% conclusive, only increasingly indicative.

15kOK, I see, thank you healey!

190I get a feeling you're confusing p-values for similarity scores. Just so we're clear, p-value for a single non-repeated test is the possibility you're seeing what you're seeing just by chance. As in, the data you're using to prove your hypothesis will show similar results even if your hypothesis weren't true.

Also, remember that the null hypothesis is not the opposite of the alternate hypothesis - the null hypothesis is the absence of an explanation, hence the name. So if your evidence for the alternate hypothesis is insufficient (AKA statistically insignificant p-value AKA p-value > 0.05), all you can conclude is that your hypothesis doesn't hold up.

25kThank you Ram. But I still have some confusion, maybe beyond p-value. Take RNA-seq for example, we cannot say one gene in two samples is similar, we just can say one gene is differently expressed between two samples with p-value belowing 0.05, and taking a risk of less than 5% to be wrong. For the genes that have p-value larger than 0.05(take 0.05 as threshold), we could not say they are turely similar expressed between two samples, right?

It seems to be an easy question for you, but I still get a bit confused. I'm still waiting for an easy and detailed explaination.

190You can't say their expression is similar.

All you can say is that it is

not statistically significantly different. But those 2 things are not the same at all.15kThe logical/statistical "opposite" of different is not similar, it is "not different", because you're not looking at extremes, you're looking at set complements.

25k