Question: How to compare DGE between different groups and create a volcano plot in R based on a GEO2R dataset?
1
gravatar for ishackm
16 months ago by
ishackm90
ishackm90 wrote:

Hi all,

I have the following GEO2R dataset which has 4 different groups. I would like to perform DGE on these groups please on R and create volcano plots to compare the DGE. For example, G0 vs G1.

This is my code so far in R:

> # Version info: R 3.2.3, Biobase 2.30.0, GEOquery 2.40.0, limma 3.26.8
> # R scripts generated  Tue Mar 26 09:31:37 EDT 2019
> 
> ################################################################
> #   Differential expression analysis with limma library(Biobase) library(GEOquery) library(limma)
> 
> # load series and platform data from GEO
> 
> gset <- getGEO("GSE40595", GSEMatrix =TRUE, AnnotGPL=TRUE) if
> (length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx
> <- 1 gset <- gset[[idx]]
> 
> # make proper column names to match toptable  fvarLabels(gset) <- make.names(fvarLabels(gset))
> 
> # group names for all samples gsms <- paste0("00000000111111111111111111111111111111122222233333",
>                "333333333333333333333333333") sml <- c() for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }
> 
> # log2 transform ex <- exprs(gset) qx <- as.numeric(quantile(ex, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T)) LogC <- (qx[5] > 100) ||   (qx[6]-qx[1] > 50 && qx[2] > 0) ||   (qx[2] > 0 && qx[2] < 1 && qx[4]
> > 1 && qx[4] < 2) if (LogC) { ex[which(ex <= 0)] <- NaN exprs(gset) <- log2(ex) }
> 
> # set up the data and proceed with analysis sml <- paste("G", sml, sep="")    # set group names fl <- as.factor(sml) gset$description <-
> fl design <- model.matrix(~ description + 0, gset) colnames(design) <-
> levels(fl) fit <- lmFit(gset, design) cont.matrix <-
> makeContrasts(G3-G0, G1-G0, G2-G1, G3-G2, levels=design) fit2 <-
> contrasts.fit(fit, cont.matrix) fit2 <- eBayes(fit2, 0.01) tT <-
> topTable(fit2, adjust="fdr", sort.by="B", number=250)
> 
> tT <- subset(tT,
> select=c("ID","adj.P.Val","P.Value","F","Gene.symbol","Gene.title"))
> write.table(tT, file=stdout(), row.names=F, sep="\t")

If you could guide me on how I could edit this code, that would be really helpful.

Many Thanks,

Ishack

dge volcano plot geo2r R • 565 views
ADD COMMENTlink modified 16 months ago • written 16 months ago by ishackm90
0
gravatar for Kevin Blighe
16 months ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

Edit: scroll down for answer

-----------------------------

Hey, can you please tidy your code so that people can re-run it?

Generally, what you are doing seems fine. However, when you run topTable(), it will likely be only taking the results from one comparison. You appear to want to generate multiple results tables for G3-G0, G1-G0, G2-G1, and G3-G2? You can select the results for individual comparisons via the coef parameter that is passed to topTable()

For a volcano plot, you can use the in-built limma function, but it is basic: https://rdrr.io/bioc/limma/man/volcanoplot.html

You could instead use my EnhancedVolcano: Publication-ready volcano plots with enhanced colouring and labeling

ADD COMMENTlink modified 16 months ago • written 16 months ago by Kevin Blighe63k

Hi Kevin,

Thanks for your answer.

Can you give a example with the coef parameter that is passed to topTable() please? This is my first time doing this and it would be helpful if you could give me an example?

Many Thanks

Ishack

ADD REPLYlink written 16 months ago by ishackm90

What are the colnames of your cont.matrix object? - those should be the coefficient names.

The layout of the 1s and 0s in your design (and 1 and -1 in contrast matrix) will also help you to recognise the comparison to which each column in the design / contrast matrix relates.

To then specify an individual comparison using your code, you would do, for example:

# choose first column from design and generate stats based on the comparison implied by it
topTable(fit2, coef = 1, adjust="fdr", sort.by="B", number=250)

# choose column labeled *G3vsG0* from design
topTable(fit2, coef = 'G3vsG0', adjust="fdr", sort.by="B", number=250)
ADD REPLYlink modified 16 months ago • written 16 months ago by Kevin Blighe63k

Hi Kevin,

I implemented your code but I get the following error:

 > Warning message: In contrasts.fit(fit, cont.matrix) :   row names of
    > contrasts don't match col names of coefficients

    > head (design)
               G0 G1 G2 G3
    GSM997591  1  0  0  0
    GSM997592  1  0  0  0
    GSM997593  1  0  0  0
    GSM997594  1  0  0  0
    GSM997595  1  0  0  0
    GSM997596  1  0  0  0


    > cont.matrix
          Contrasts
    Levels G3 - G0 G1 - G0 G2 - G1 G3 - G2
        G0      -1      -1       0       0
        G1       0       1      -1       0
        G2       0       0       1      -1
        G3       1       0       0       1


    # set up the data and proceed with analysis
    sml <- paste("G", sml, sep="")    # set group names
    fl <- as.factor(sml)
    gset$description <- fl
    design <- model.matrix(~ description + 0, gset)
    colnames(design) <- levels(fl)
    fit <- lmFit(gset, design)
    cont.matrix <- makeContrasts(G3-G0, G1-G0, G2-G1, G3-G2, levels=design)
    fit2 <- contrasts.fit(fit, cont.matrix)
    fit2 <- eBayes(fit2, 0.01)
    tT <- topTable(fit2, coef = 1, adjust="fdr", sort.by="B", number=250)
    tT <- topTable(fit2, coef = 'G3 - G0', adjust="fdr", sort.by="B", number=250)




    tT <- subset(tT, select=c("ID","adj.P.Val","P.Value","F","Gene.symbol","Gene.title"))
    write.table(tT, file=stdout(), row.names=F, sep="\t")

What I am doing wrong here and how can I fix the error please?

Many Thanks,

Ishack

ADD REPLYlink modified 16 months ago • written 16 months ago by ishackm90

This is how you can do it:

gset <- getGEO("GSE40595", GSEMatrix =TRUE, getGPL=FALSE)

if (length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]
fvarLabels(gset) <- make.names(fvarLabels(gset))

gsms <- paste0("00000000111111111111111111111111111111122222233333","333333333333333333333333333") 
sml <- c()
for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }
ex <- exprs(gset); qx <- as.numeric(quantile(ex, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T)); LogC <- (qx[5] > 100) ||   (qx[6]-qx[1] > 50 && qx[2] > 0) ||   (qx[2] > 0 && qx[2] < 1 && qx[4]> 1 && qx[4] < 2)

if (LogC) { ex[which(ex <= 0)] <- NaN; exprs(gset) <- log2(ex) }

# set up the data and proceed with analysis
sml <- paste("G", sml, sep="")    # set group names
fl <- as.factor(sml)
gset$description <- fl
design <- model.matrix(~ description + 0, gset)
colnames(design) <- levels(fl)
fit <- lmFit(gset, design)
cont.matrix <- makeContrasts(G3-G0, G1-G0, G2-G1, G3-G2, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, 0.01)

cont.matrix
      Contrasts
Levels G3 - G0 G1 - G0 G2 - G1 G3 - G2
    G0      -1      -1       0       0
    G1       0       1      -1       0
    G2       0       0       1      -1
    G3       1       0       0       1

head(design)
          G0 G1 G2 G3
GSM997591  1  0  0  0
GSM997592  1  0  0  0
GSM997593  1  0  0  0
GSM997594  1  0  0  0
GSM997595  1  0  0  0
GSM997596  1  0  0  0

G3 vs G0

topTable(fit2, coef = 'G3 - G0', adjust="fdr", sort.by="B", number=250)
topTable(fit2, coef = 1, adjust="fdr", sort.by="B", number=250)

G1 vs G0

topTable(fit2, coef = 'G1 - G0', adjust="fdr", sort.by="B", number=250)
topTable(fit2, coef = 2, adjust="fdr", sort.by="B", number=250)

G2 vs G1

topTable(fit2, coef = 'G2 - G1', adjust="fdr", sort.by="B", number=250)
topTable(fit2, coef = 3, adjust="fdr", sort.by="B", number=250)

G3 vs G2

topTable(fit2, coef = 'G3 - G2', adjust="fdr", sort.by="B", number=250)
topTable(fit2, coef = 4, adjust="fdr", sort.by="B", number=250)

For each comparison, each topTable() command will produce the same result. We are just accessing the model coefficients in different ways. When we specify a number, e.g., 1,2,3,4, it relates to the column number in your design and contrast matrix. You can also reference by the column name in the contrast matrix.

Let's check that (here, we check each value to its corresponding values in the output):

table(topTable(fit2, coef = 'G3 - G0', adjust="fdr", sort.by="B", number=250) == topTable(fit2, coef = 1, adjust="fdr", sort.by="B", number=250))
TRUE 
1500 

table(topTable(fit2, coef = 'G1 - G0', adjust="fdr", sort.by="B", number=250) == topTable(fit2, coef = 2, adjust="fdr", sort.by="B", number=250))
TRUE 
1500 

table(topTable(fit2, coef = 'G2 - G1', adjust="fdr", sort.by="B", number=250) == topTable(fit2, coef = 3, adjust="fdr", sort.by="B", number=250))
TRUE 
1500 

table(topTable(fit2, coef = 'G3 - G2', adjust="fdr", sort.by="B", number=250) ==  topTable(fit2, coef = 4, adjust="fdr", sort.by="B", number=250))
TRUE 
1500
ADD REPLYlink modified 16 months ago • written 16 months ago by Kevin Blighe63k

Be very sure that this command is doing exactly what you believe. I would check multiple times:

gsms <- paste0("00000000111111111111111111111111111111122222233333","333333333333333333333333333")
ADD REPLYlink written 16 months ago by Kevin Blighe63k

Yes! It finally works.

Thanks very much Kevin for your quick responses. Much appreciated.

ADD REPLYlink modified 16 months ago • written 16 months ago by ishackm90

Great work, ishackm

ADD REPLYlink written 16 months ago by Kevin Blighe63k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1395 users visited in the last hour