Question: Question about processed microarray data from ArrayExpress
0
gravatar for newbie
8 months ago by
newbie90
newbie90 wrote:

I have downloaded some processed Microarray data from ArrayExpress (Affymetrix GeneChip Human Genome U133 Plus 2.0). This is normalised data and it looks like below in a dataframe df:

enter image description here

This is the dput(df)

structure(list(Samples = structure(1:9, .Label = c("H_106.CD.act", 
    "H_106.CD.nact", "H_107.CD.act", "H_107.CD.nact", "H_340.normal", 
    "H_404.CD.act", "H_404.CD.nact", "H_738.normal", "H_755.normal"
    ), class = "factor"), Type = structure(c(1L, 2L, 1L, 2L, 3L, 
    1L, 2L, 3L, 3L), .Label = c("Active CD", "Non-Active CD", "Normal"
    ), class = "factor"), PGAM5 = structure(c(H_106.CD.act = 6L, 
    H_106.CD.nact = 4L, H_107.CD.act = 8L, H_107.CD.nact = 1L, H_340.normal = 3L, 
    H_404.CD.act = 7L, H_404.CD.nact = 9L, H_738.normal = 5L, H_755.normal = 2L
    ), .Label = c("4.571231311", "4.755115729", "4.887622107", "4.891329464", 
    "4.912189399", "5.46180878", "5.49774779", "5.612888254", "5.880677067"
    ), class = "factor"), NME1 = structure(c(H_106.CD.act = 1L, H_106.CD.nact = 9L, 
    H_107.CD.act = 3L, H_107.CD.nact = 7L, H_340.normal = 5L, H_404.CD.act = 2L, 
    H_404.CD.nact = 4L, H_738.normal = 6L, H_755.normal = 8L), .Label = c("10.02692043", 
    "10.04369937", "10.57609398", "10.65706982", "8.221264698", "8.906353951", 
    "9.395091983", "9.533567976", "9.676355234"), class = "factor"), 
        LHPP = structure(c(H_106.CD.act = 4L, H_106.CD.nact = 5L, 
        H_107.CD.act = 1L, H_107.CD.nact = 6L, H_340.normal = 7L, 
        H_404.CD.act = 2L, H_404.CD.nact = 3L, H_738.normal = 9L, 
        H_755.normal = 8L), .Label = c("6.344182108", "6.48823957", 
        "6.514741929", "6.562740787", "6.831723902", "7.071119084", 
        "7.188415855", "7.243049713", "7.290671656"), class = "factor"), 
        PHPT1 = structure(c(H_106.CD.act = 5L, H_106.CD.nact = 2L, 
        H_107.CD.act = 7L, H_107.CD.nact = 8L, H_340.normal = 4L, 
        H_404.CD.act = 6L, H_404.CD.nact = 3L, H_738.normal = 1L, 
        H_755.normal = 9L), .Label = c("10.04890824", "10.08906847", 
        "10.215382", "10.30426286", "9.59467692", "9.610542319", 
        "9.787960611", "9.821975201", "9.893869572"), class = "factor")), row.names = c(NA, 
    -9L), class = "data.frame")

I tried making a box plot out of the above data and wanted to check the significance between each Type.

library(reshape2)
library(ggplot2)
df.n <- melt(final6, c("Samples", "Type"))

positions <- c("Normal", "Active CD", "Non-Active CD")
library(ggplot2)
library(ggsignif)
library(EnvStats)
library(ggpubr)
library(forcats)

r <- ggplot(data = df.n, aes(x=fct_reorder(Type, value), y=value)) + 
  geom_boxplot() + facet_wrap(~variable) +
  geom_signif(comparisons = list(c("Normal","Active CD"),
                                 c("Normal","Non-Active CD"), c("Active CD","Non-Active CD")),
              map_signif_level = TRUE, y_position = c(8,9,10)) + 
  theme_bw(base_size = 14) + xlab("")+
  theme(axis.text=element_text(size=15, face = "bold", color = "black"),
        axis.title=element_text(size=15, face = "bold", color = "black"),
        strip.text = element_text(size=15, face = "bold", color = "black"))
r + stat_n_text(size = 4) + scale_x_discrete(limits = positions) + ylab("Normalized Expression")

This gave me an output like below:

enter image description here

May I know why the data looks like that in the box plot? Do I need to normalise this data again? Any solution to make the box plot look better?

ADD COMMENTlink modified 8 months ago by ATpoint40k • written 8 months ago by newbie90
2
gravatar for ATpoint
8 months ago by
ATpoint40k
Germany
ATpoint40k wrote:

Not sure what you are plotting there but you should see that your y-axes are not properly scaled at all. Maybe the values are interpreted as characters. I quickly put together some code, not pretty but it should properly plot these data. It starts from these toplevel data, here named df. It is standard to have expression data with genes as rows and samples as columns by the way:

## say you have these data as variable named df:
expr <- t(df[,3:ncol(df)])
class(expr) <- "numeric"
colnames(expr) <- t(df[,1])
factors <- unlist(lapply(c("act", "nact", "act", "nact", "normal", "act", "nact", "normal", "normal"),
                  function(x)rep(x,4)))

melted <- melt(expr)
melted$factors <- factors

p2 <- ggplot(melted, aes(x=factors, y=value, fill=Var1)) + 
  geom_boxplot() +
  coord_cartesian(ylim = c(4, 12)) + 
  facet_wrap(~Var1, scale="free")
p2

enter image description here

ADD COMMENTlink modified 7 months ago • written 8 months ago by ATpoint40k

Sorry I guess you made a mistake with the factors. I see that in you mixed up some factors. I found that you names some Non-active CD also as Normal. Could you please check that one and tell me. thanq

I saw this in your melted data.

ADD REPLYlink modified 7 months ago by ATpoint40k • written 8 months ago by newbie90
1

Sorry my bad, I forgot to replicate the factors, so once for each gene. Edited the answer.

ADD REPLYlink written 7 months ago by ATpoint40k

thanks a lot @ATpoint

ADD REPLYlink written 7 months ago by newbie90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1870 users visited in the last hour