Question: How to compare DGE between different groups and create a volcano plot in R based on a GEO2R dataset?
1
gravatar for ishackm
20 months ago by
ishackm100
ishackm100 wrote:

Hi all,

I have the following GEO2R dataset which has 4 different groups. I would like to perform DGE on these groups please on R and create volcano plots to compare the DGE. For example, G0 vs G1.

This is my code so far in R:

> # Version info: R 3.2.3, Biobase 2.30.0, GEOquery 2.40.0, limma 3.26.8
> # R scripts generated  Tue Mar 26 09:31:37 EDT 2019
> 
> ################################################################
> #   Differential expression analysis with limma library(Biobase) library(GEOquery) library(limma)
> 
> # load series and platform data from GEO
> 
> gset <- getGEO("GSE40595", GSEMatrix =TRUE, AnnotGPL=TRUE) if
> (length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx
> <- 1 gset <- gset[[idx]]
> 
> # make proper column names to match toptable  fvarLabels(gset) <- make.names(fvarLabels(gset))
> 
> # group names for all samples gsms <- paste0("00000000111111111111111111111111111111122222233333",
>                "333333333333333333333333333") sml <- c() for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }
> 
> # log2 transform ex <- exprs(gset) qx <- as.numeric(quantile(ex, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T)) LogC <- (qx[5] > 100) ||   (qx[6]-qx[1] > 50 && qx[2] > 0) ||   (qx[2] > 0 && qx[2] < 1 && qx[4]
> > 1 && qx[4] < 2) if (LogC) { ex[which(ex <= 0)] <- NaN exprs(gset) <- log2(ex) }
> 
> # set up the data and proceed with analysis sml <- paste("G", sml, sep="")    # set group names fl <- as.factor(sml) gset$description <-
> fl design <- model.matrix(~ description + 0, gset) colnames(design) <-
> levels(fl) fit <- lmFit(gset, design) cont.matrix <-
> makeContrasts(G3-G0, G1-G0, G2-G1, G3-G2, levels=design) fit2 <-
> contrasts.fit(fit, cont.matrix) fit2 <- eBayes(fit2, 0.01) tT <-
> topTable(fit2, adjust="fdr", sort.by="B", number=250)
> 
> tT <- subset(tT,
> select=c("ID","adj.P.Val","P.Value","F","Gene.symbol","Gene.title"))
> write.table(tT, file=stdout(), row.names=F, sep="\t")

If you could guide me on how I could edit this code, that would be really helpful.

Many Thanks,

Ishack

dge volcano plot geo2r R • 672 views
ADD COMMENTlink modified 20 months ago • written 20 months ago by ishackm100
0
gravatar for Kevin Blighe
20 months ago by
Kevin Blighe67k
Republic of Ireland
Kevin Blighe67k wrote:

Edit: scroll down for answer

-----------------------------

Hey, can you please tidy your code so that people can re-run it?

Generally, what you are doing seems fine. However, when you run topTable(), it will likely be only taking the results from one comparison. You appear to want to generate multiple results tables for G3-G0, G1-G0, G2-G1, and G3-G2? You can select the results for individual comparisons via the coef parameter that is passed to topTable()

For a volcano plot, you can use the in-built limma function, but it is basic: https://rdrr.io/bioc/limma/man/volcanoplot.html

You could instead use my EnhancedVolcano: Publication-ready volcano plots with enhanced colouring and labeling

ADD COMMENTlink modified 20 months ago • written 20 months ago by Kevin Blighe67k

Hi Kevin,

Thanks for your answer.

Can you give a example with the coef parameter that is passed to topTable() please? This is my first time doing this and it would be helpful if you could give me an example?

Many Thanks

Ishack

ADD REPLYlink written 20 months ago by ishackm100

What are the colnames of your cont.matrix object? - those should be the coefficient names.

The layout of the 1s and 0s in your design (and 1 and -1 in contrast matrix) will also help you to recognise the comparison to which each column in the design / contrast matrix relates.

To then specify an individual comparison using your code, you would do, for example:

# choose first column from design and generate stats based on the comparison implied by it
topTable(fit2, coef = 1, adjust="fdr", sort.by="B", number=250)

# choose column labeled *G3vsG0* from design
topTable(fit2, coef = 'G3vsG0', adjust="fdr", sort.by="B", number=250)
ADD REPLYlink modified 20 months ago • written 20 months ago by Kevin Blighe67k

Hi Kevin,

I implemented your code but I get the following error:

 > Warning message: In contrasts.fit(fit, cont.matrix) :   row names of
    > contrasts don't match col names of coefficients

    > head (design)
               G0 G1 G2 G3
    GSM997591  1  0  0  0
    GSM997592  1  0  0  0
    GSM997593  1  0  0  0
    GSM997594  1  0  0  0
    GSM997595  1  0  0  0
    GSM997596  1  0  0  0


    > cont.matrix
          Contrasts
    Levels G3 - G0 G1 - G0 G2 - G1 G3 - G2
        G0      -1      -1       0       0
        G1       0       1      -1       0
        G2       0       0       1      -1
        G3       1       0       0       1


    # set up the data and proceed with analysis
    sml <- paste("G", sml, sep="")    # set group names
    fl <- as.factor(sml)
    gset$description <- fl
    design <- model.matrix(~ description + 0, gset)
    colnames(design) <- levels(fl)
    fit <- lmFit(gset, design)
    cont.matrix <- makeContrasts(G3-G0, G1-G0, G2-G1, G3-G2, levels=design)
    fit2 <- contrasts.fit(fit, cont.matrix)
    fit2 <- eBayes(fit2, 0.01)
    tT <- topTable(fit2, coef = 1, adjust="fdr", sort.by="B", number=250)
    tT <- topTable(fit2, coef = 'G3 - G0', adjust="fdr", sort.by="B", number=250)




    tT <- subset(tT, select=c("ID","adj.P.Val","P.Value","F","Gene.symbol","Gene.title"))
    write.table(tT, file=stdout(), row.names=F, sep="\t")

What I am doing wrong here and how can I fix the error please?

Many Thanks,

Ishack

ADD REPLYlink modified 20 months ago • written 20 months ago by ishackm100

This is how you can do it:

gset <- getGEO("GSE40595", GSEMatrix =TRUE, getGPL=FALSE)

if (length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]
fvarLabels(gset) <- make.names(fvarLabels(gset))

gsms <- paste0("00000000111111111111111111111111111111122222233333","333333333333333333333333333") 
sml <- c()
for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }
ex <- exprs(gset); qx <- as.numeric(quantile(ex, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T)); LogC <- (qx[5] > 100) ||   (qx[6]-qx[1] > 50 && qx[2] > 0) ||   (qx[2] > 0 && qx[2] < 1 && qx[4]> 1 && qx[4] < 2)

if (LogC) { ex[which(ex <= 0)] <- NaN; exprs(gset) <- log2(ex) }

# set up the data and proceed with analysis
sml <- paste("G", sml, sep="")    # set group names
fl <- as.factor(sml)
gset$description <- fl
design <- model.matrix(~ description + 0, gset)
colnames(design) <- levels(fl)
fit <- lmFit(gset, design)
cont.matrix <- makeContrasts(G3-G0, G1-G0, G2-G1, G3-G2, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, 0.01)

cont.matrix
      Contrasts
Levels G3 - G0 G1 - G0 G2 - G1 G3 - G2
    G0      -1      -1       0       0
    G1       0       1      -1       0
    G2       0       0       1      -1
    G3       1       0       0       1

head(design)
          G0 G1 G2 G3
GSM997591  1  0  0  0
GSM997592  1  0  0  0
GSM997593  1  0  0  0
GSM997594  1  0  0  0
GSM997595  1  0  0  0
GSM997596  1  0  0  0

G3 vs G0

topTable(fit2, coef = 'G3 - G0', adjust="fdr", sort.by="B", number=250)
topTable(fit2, coef = 1, adjust="fdr", sort.by="B", number=250)

G1 vs G0

topTable(fit2, coef = 'G1 - G0', adjust="fdr", sort.by="B", number=250)
topTable(fit2, coef = 2, adjust="fdr", sort.by="B", number=250)

G2 vs G1

topTable(fit2, coef = 'G2 - G1', adjust="fdr", sort.by="B", number=250)
topTable(fit2, coef = 3, adjust="fdr", sort.by="B", number=250)

G3 vs G2

topTable(fit2, coef = 'G3 - G2', adjust="fdr", sort.by="B", number=250)
topTable(fit2, coef = 4, adjust="fdr", sort.by="B", number=250)

For each comparison, each topTable() command will produce the same result. We are just accessing the model coefficients in different ways. When we specify a number, e.g., 1,2,3,4, it relates to the column number in your design and contrast matrix. You can also reference by the column name in the contrast matrix.

Let's check that (here, we check each value to its corresponding values in the output):

table(topTable(fit2, coef = 'G3 - G0', adjust="fdr", sort.by="B", number=250) == topTable(fit2, coef = 1, adjust="fdr", sort.by="B", number=250))
TRUE 
1500 

table(topTable(fit2, coef = 'G1 - G0', adjust="fdr", sort.by="B", number=250) == topTable(fit2, coef = 2, adjust="fdr", sort.by="B", number=250))
TRUE 
1500 

table(topTable(fit2, coef = 'G2 - G1', adjust="fdr", sort.by="B", number=250) == topTable(fit2, coef = 3, adjust="fdr", sort.by="B", number=250))
TRUE 
1500 

table(topTable(fit2, coef = 'G3 - G2', adjust="fdr", sort.by="B", number=250) ==  topTable(fit2, coef = 4, adjust="fdr", sort.by="B", number=250))
TRUE 
1500
ADD REPLYlink modified 20 months ago • written 20 months ago by Kevin Blighe67k

Be very sure that this command is doing exactly what you believe. I would check multiple times:

gsms <- paste0("00000000111111111111111111111111111111122222233333","333333333333333333333333333")
ADD REPLYlink written 20 months ago by Kevin Blighe67k

Yes! It finally works.

Thanks very much Kevin for your quick responses. Much appreciated.

ADD REPLYlink modified 20 months ago • written 20 months ago by ishackm100

Great work, ishackm

ADD REPLYlink written 20 months ago by Kevin Blighe67k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1561 users visited in the last hour