I am trying to replicate the results of the paper Epithelial–mesenchymal status renders differential responses to cisplatin in ovarian cancer roughly guided by the pipeline An end to end workflow for differential gene expression using Affymetrix microarrays.
Basically, we are trying to detect a differential expression between mesenchymal and epithelial cells under cisplatin treatment vs control. However, I am facing a problem when trying to calculate the p-value for the Mann-Whitney U-test.
This is the part of the paper I am referring to:
Robust Multichip Average (RMA) normalization was performed at the transcript level on the results from the Affymetrix Human Gene ST1.0 arrays using Affymetrix Power Tool 188.8.131.52 for all 46 sham- or cisplatin-treated ovarian cancer cell lines. The normalized data were subsequently standardized using ComBat71 to remove the batch effect. In this experiment, the cisplatin treatment assay was performed in triplicate on 20 cell lines, while single assays (without replicate) were performed on the remaining 26 cell lines. Taking advantage of the triplicate data, potentially fragile probes with strong variations (an s.d. of >0.2) within the triplicates were removed, decreasing the probe number from 33 297 to 21 329. To perform a fair comparison, the triplicate data were then log-averaged into one value so that one result for each cell line could be used in the following analyses. To detect differential responses to cisplatin between epithelial- and mesenchymal-like cell lines, the transcriptomic responses to cisplatin were computed by subtracting the gene expression value of control (cisplatin untreated) cells from that of cisplatin-treated cells. Mann–Whitney U-test (P<0.01 as a cut-off value) was subsequently used to detect the differential transcriptomic responses between the expression changes by cisplatin treatment in epithelial-like cell lines with those in mesenchymal-like cell lines (Supplementary Table 3).
My p-values for a given gene are astronomically different to that of the paper...
Here is my code:
expr_f is the expression values matrix from the Expression set (log-transformed).
SDRF_f is the pData from the same expression set.
I am feeding into the wilcox.test function a vector of expression from epithelial cells (epi_values) and mesencymal cells (mes_values) for the same probe (8143663).
genelist=rownames(expr_f) epi_samples = rownames(subset(SDRF_f, SDRF_f[,"Classification"]=="Epithelial-like")) mes_samples = rownames(subset(SDRF_f, SDRF_f[,"Classification"]=="Mesenchymal-like")) epi_values = expr_f["8143663", epi_samples] mes_values = expr_f["8143663", mes_samples] wilcox.test(mes_values, epi_values, exact=TRUE)
In the paper, their p-value for this probe is 0.000372259258165627 while mine is 0.8785. The difference makes no sense at all to me.
I don't know if I am not quite getting the concept of the u-test, if I am using the function wrong, or if it's something else. Any input is appreciated