Many of you will be aware of the Golub et al data, one of the first high-throughput gene expression datasets ever, used for teaching in many places. I recently noticed that for one of the genes, the expression pattern in their famous figure does not seem to match the underlying data (as they are available via the R package, and described in Wim Krijnen's wonderful book "Applied Statistics for Bioinformatics using R".) But maybe I'm just missing or misunderstanding something.
From the Figure of the original publication, it is clear that expression values of Cyclin D3 for AML are mostly below 0, cf. the forth row.
However, inspecting the data, and concordant with e.g. Figure 2.4 of the book, "Boxplot of ALL and AML expression values of gene CCND3 Cyclin D3", it is clear that for AML, expression values of Cyclin D3 are above 0.
I checked that there is only one Cyclin D3 in the dataset; I'm not sure about the possibility that the data were normalized differently; I checked some other genes, and I found no such problem for any other gene, so I'm not sure that it's a normalization issue.
Can anyone help and shed some light on this issue?