Does More Number Of Replicates Results In Insignificant P-Values?
3
1
Entering edit mode
10.7 years ago
RT ▴ 360

Hi All,

I am analyzing affymetrix data for stress vs control. I have 4 replicates for control and 5 replicates for stress. Normally, I use the stringency crietria fold change > log(1.5) and adjustedPval <0.01. For this analyses, I can see genes with significant fold change but p-value is not significant. And yes some preliminary analyses done earlier showed the differential expression of few genes. Do I have to reduce my stringency criteria for adj p-val (what will be preferable)? Is there any chance that more number of replicates has increased the p-val.

Many thanks, Ritu

microarray p-value replicates statistics • 4.0k views
0
Entering edit mode

If the adjusted p value is actually a False Discovery Rate, then requiring 0.01 is overly stringent.

0
Entering edit mode

It is certainly possible that you have real differences which are not reaching significance level because of your small sample size. Increasing replicates may help. But, also try some pre-filtering which I suggest in my answer below.

2
Entering edit mode
10.7 years ago
John ★ 1.5k

Statistically provided that you have control over source of errors, high nuber of replications gives higher power and less false discovery - means that "whatever decision" comes out is more true ...There are problems with low digree of freedom data (usually genomic designs do as we can not afford to make total treatments 30 or more for example) and high number of treatments (chances of getting false positive is high) ....so need a good statistical consultation...

0
Entering edit mode
10.7 years ago
RT ▴ 360

Thanks Sean and John for your help.

Sean- I tried with relaxed stringency crietria Adj.Pvalues <0.05. Still no genes are satisfying this criteria.

But if just see my unadjusted Pvalues (unadjusted.P-values<0.01) they look good and I have 400 candidate genes that are differentially expressed. Can I consider the criteria of unadjusted P values? I am not a statistician so I am not very clear where to consider adjusted P-values and where to consider unadjusted ones? Can any one please explain me.

Below I am copying few lines of my top results. Why so much difference between adjusted P values and unadjusted P-values? Any help/suggestions would be much appreciated.

Many thanks, R.

ID logFC AveExpr t P.Value adj.P.Val B

xx1_at -2.0682486 8.298777 -5.671953 4.754428e-05 0.4441129 1.16804549

xx3_at -1.1045124 7.838776 -5.446288 7.206374e-05 0.4441129 0.90168939

xx9_at -0.9933025 5.900082 -5.378236 8.180199e-05 0.4441129 0.81951230

xx5_at 0.5784688 5.979741 5.211694 1.118385e-04 0.4441129 0.61478979

xx2_at -1.1998221 8.423590 -5.174542 1.199786e-04 0.4441129 0.56842423

xx8_at -1.7810264 7.939280 -5.071211 1.460053e-04 0.4441129 0.43814027

xx4_at -0.4775558 2.965913 -5.026975 1.588722e-04 0.4441129 0.38177252

xx3_at 0.4783773 6.547145 4.917570 1.959800e-04 0.4441129 0.24084966

xx2_at -0.6317992 2.953137 -4.795892 2.479321e-04 0.4441129 0.08161861

xx7_at -0.5901271 3.606651 -4.743114 2.747013e-04 0.4441129 0.01174557

xx2_at -0.6615366 5.228286 -4.708753 2.937133e-04 0.4441129 -0.03400289

xx4_at -1.5438633 5.824030 -4.674656 3.139213e-04 0.4441129 -0.07960106

xx2_at 0.5362903 4.838442 4.613063 3.541284e-04 0.4441129 -0.16246898

0
Entering edit mode

why are your adj.P.Val all the same number? And what is the last column? I think you've pasted slightly wrong data or done the correction wrong. If you're doing multiple testing then you should always perform a multiple hypothesis testing corrections and use the adjusted p-values.

0
Entering edit mode

Its not uncommon for adjusted p-values to have many of the same value (depending on which correction method you use).

0
Entering edit mode

0
Entering edit mode
10.7 years ago

Have you tried a pre-filtering step to reduce the total number of tests? This should be unbiased with regard to your comparison (i.e., do not use FC or p-value). It is common with Affymetrix expression datasets to filter out genes with very low (or extremely high) variance (or coefficient of variation) across all samples. It is also common to filter out genes which are not considered "present" or "expressed above background" in at least some minimum percentage of samples. If you have a matrix of normalized log2 expression values (e.g., from rma or gcrma) you can use something like:

library(genefilter)
#Preliminary gene filtering
X=data
#Take values and un-log2 them, then filter out any genes according to following criteria (recommended in multtest/MTP documentation):
#At least 20% of samples should have raw intensity greater than 100
#The coefficient of variation (sd/mean) is between 0.7 and 10
ffun=filterfun(pOverA(p = 0.2, A = 100), cv(a = 0.7, b = 10))
filt=genefilter(2^X,ffun)
filt_Data=rawdata[filt,]

0
Entering edit mode

Thanks Obi. But I have already done the filtering on the normalized data. This has not helped either. Is this has to do with the quality of my arrays because I have analyzed few more arrays with just two replicates and results were fine. Any other ideas are welcome.

0
Entering edit mode

It could be a problem with your arrays. If you do quality checks and look at the overall distributions do you see anything unusual? The other possibility is simply that there aren't significant differences between your treatments. Or, that there were sample mix-ups. As an aside, I wouldn't trust p-values obtained from a test with only 2 replicates. So, that's probably not a good baseline for comparison.