How to compare DGE between different groups and create a volcano plot in R based on a GEO2R dataset?
2
1
Entering edit mode
5.7 years ago
ishackm ▴ 110

Hi all,

I have the following GEO2R dataset which has 4 different groups. I would like to perform DGE on these groups please on R and create volcano plots to compare the DGE. For example, G0 vs G1.

This is my code so far in R:

> # Version info: R 3.2.3, Biobase 2.30.0, GEOquery 2.40.0, limma 3.26.8
> # R scripts generated  Tue Mar 26 09:31:37 EDT 2019
> 
> ################################################################
> #   Differential expression analysis with limma library(Biobase) library(GEOquery) library(limma)
> 
> # load series and platform data from GEO
> 
> gset <- getGEO("GSE40595", GSEMatrix =TRUE, AnnotGPL=TRUE) if
> (length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx
> <- 1 gset <- gset[[idx]]
> 
> # make proper column names to match toptable  fvarLabels(gset) <- make.names(fvarLabels(gset))
> 
> # group names for all samples gsms <- paste0("00000000111111111111111111111111111111122222233333",
>                "333333333333333333333333333") sml <- c() for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }
> 
> # log2 transform ex <- exprs(gset) qx <- as.numeric(quantile(ex, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T)) LogC <- (qx[5] > 100) ||   (qx[6]-qx[1] > 50 && qx[2] > 0) ||   (qx[2] > 0 && qx[2] < 1 && qx[4]
> > 1 && qx[4] < 2) if (LogC) { ex[which(ex <= 0)] <- NaN exprs(gset) <- log2(ex) }
> 
> # set up the data and proceed with analysis sml <- paste("G", sml, sep="")    # set group names fl <- as.factor(sml) gset$description <-
> fl design <- model.matrix(~ description + 0, gset) colnames(design) <-
> levels(fl) fit <- lmFit(gset, design) cont.matrix <-
> makeContrasts(G3-G0, G1-G0, G2-G1, G3-G2, levels=design) fit2 <-
> contrasts.fit(fit, cont.matrix) fit2 <- eBayes(fit2, 0.01) tT <-
> topTable(fit2, adjust="fdr", sort.by="B", number=250)
> 
> tT <- subset(tT,
> select=c("ID","adj.P.Val","P.Value","F","Gene.symbol","Gene.title"))
> write.table(tT, file=stdout(), row.names=F, sep="\t")

If you could guide me on how I could edit this code, that would be really helpful.

Many Thanks,

Ishack

DGE r GEO2R volcano plot • 2.8k views
ADD COMMENT
0
Entering edit mode
5.7 years ago

Edit: scroll down for answer

-----------------------------

Hey, can you please tidy your code so that people can re-run it?

Generally, what you are doing seems fine. However, when you run topTable(), it will likely be only taking the results from one comparison. You appear to want to generate multiple results tables for G3-G0, G1-G0, G2-G1, and G3-G2? You can select the results for individual comparisons via the coef parameter that is passed to topTable()

For a volcano plot, you can use the in-built limma function, but it is basic: https://rdrr.io/bioc/limma/man/volcanoplot.html

You could instead use my EnhancedVolcano: Publication-ready volcano plots with enhanced colouring and labeling

ADD COMMENT
0
Entering edit mode

Hi Kevin,

Thanks for your answer.

Can you give a example with the coef parameter that is passed to topTable() please? This is my first time doing this and it would be helpful if you could give me an example?

Many Thanks

Ishack

ADD REPLY
0
Entering edit mode

What are the colnames of your cont.matrix object? - those should be the coefficient names.

The layout of the 1s and 0s in your design (and 1 and -1 in contrast matrix) will also help you to recognise the comparison to which each column in the design / contrast matrix relates.

To then specify an individual comparison using your code, you would do, for example:

# choose first column from design and generate stats based on the comparison implied by it
topTable(fit2, coef = 1, adjust="fdr", sort.by="B", number=250)

# choose column labeled *G3vsG0* from design
topTable(fit2, coef = 'G3vsG0', adjust="fdr", sort.by="B", number=250)
ADD REPLY
0
Entering edit mode

Hi Kevin,

I implemented your code but I get the following error:

 > Warning message: In contrasts.fit(fit, cont.matrix) :   row names of
    > contrasts don't match col names of coefficients

    > head (design)
               G0 G1 G2 G3
    GSM997591  1  0  0  0
    GSM997592  1  0  0  0
    GSM997593  1  0  0  0
    GSM997594  1  0  0  0
    GSM997595  1  0  0  0
    GSM997596  1  0  0  0


    > cont.matrix
          Contrasts
    Levels G3 - G0 G1 - G0 G2 - G1 G3 - G2
        G0      -1      -1       0       0
        G1       0       1      -1       0
        G2       0       0       1      -1
        G3       1       0       0       1


    # set up the data and proceed with analysis
    sml <- paste("G", sml, sep="")    # set group names
    fl <- as.factor(sml)
    gset$description <- fl
    design <- model.matrix(~ description + 0, gset)
    colnames(design) <- levels(fl)
    fit <- lmFit(gset, design)
    cont.matrix <- makeContrasts(G3-G0, G1-G0, G2-G1, G3-G2, levels=design)
    fit2 <- contrasts.fit(fit, cont.matrix)
    fit2 <- eBayes(fit2, 0.01)
    tT <- topTable(fit2, coef = 1, adjust="fdr", sort.by="B", number=250)
    tT <- topTable(fit2, coef = 'G3 - G0', adjust="fdr", sort.by="B", number=250)




    tT <- subset(tT, select=c("ID","adj.P.Val","P.Value","F","Gene.symbol","Gene.title"))
    write.table(tT, file=stdout(), row.names=F, sep="\t")

What I am doing wrong here and how can I fix the error please?

Many Thanks,

Ishack

ADD REPLY
0
Entering edit mode

This is how you can do it:

gset <- getGEO("GSE40595", GSEMatrix =TRUE, getGPL=FALSE)

if (length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]
fvarLabels(gset) <- make.names(fvarLabels(gset))

gsms <- paste0("00000000111111111111111111111111111111122222233333","333333333333333333333333333") 
sml <- c()
for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }
ex <- exprs(gset); qx <- as.numeric(quantile(ex, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T)); LogC <- (qx[5] > 100) ||   (qx[6]-qx[1] > 50 && qx[2] > 0) ||   (qx[2] > 0 && qx[2] < 1 && qx[4]> 1 && qx[4] < 2)

if (LogC) { ex[which(ex <= 0)] <- NaN; exprs(gset) <- log2(ex) }

# set up the data and proceed with analysis
sml <- paste("G", sml, sep="")    # set group names
fl <- as.factor(sml)
gset$description <- fl
design <- model.matrix(~ description + 0, gset)
colnames(design) <- levels(fl)
fit <- lmFit(gset, design)
cont.matrix <- makeContrasts(G3-G0, G1-G0, G2-G1, G3-G2, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, 0.01)

cont.matrix
      Contrasts
Levels G3 - G0 G1 - G0 G2 - G1 G3 - G2
    G0      -1      -1       0       0
    G1       0       1      -1       0
    G2       0       0       1      -1
    G3       1       0       0       1

head(design)
          G0 G1 G2 G3
GSM997591  1  0  0  0
GSM997592  1  0  0  0
GSM997593  1  0  0  0
GSM997594  1  0  0  0
GSM997595  1  0  0  0
GSM997596  1  0  0  0

G3 vs G0

topTable(fit2, coef = 'G3 - G0', adjust="fdr", sort.by="B", number=250)
topTable(fit2, coef = 1, adjust="fdr", sort.by="B", number=250)

G1 vs G0

topTable(fit2, coef = 'G1 - G0', adjust="fdr", sort.by="B", number=250)
topTable(fit2, coef = 2, adjust="fdr", sort.by="B", number=250)

G2 vs G1

topTable(fit2, coef = 'G2 - G1', adjust="fdr", sort.by="B", number=250)
topTable(fit2, coef = 3, adjust="fdr", sort.by="B", number=250)

G3 vs G2

topTable(fit2, coef = 'G3 - G2', adjust="fdr", sort.by="B", number=250)
topTable(fit2, coef = 4, adjust="fdr", sort.by="B", number=250)

For each comparison, each topTable() command will produce the same result. We are just accessing the model coefficients in different ways. When we specify a number, e.g., 1,2,3,4, it relates to the column number in your design and contrast matrix. You can also reference by the column name in the contrast matrix.

Let's check that (here, we check each value to its corresponding values in the output):

table(topTable(fit2, coef = 'G3 - G0', adjust="fdr", sort.by="B", number=250) == topTable(fit2, coef = 1, adjust="fdr", sort.by="B", number=250))
TRUE 
1500 

table(topTable(fit2, coef = 'G1 - G0', adjust="fdr", sort.by="B", number=250) == topTable(fit2, coef = 2, adjust="fdr", sort.by="B", number=250))
TRUE 
1500 

table(topTable(fit2, coef = 'G2 - G1', adjust="fdr", sort.by="B", number=250) == topTable(fit2, coef = 3, adjust="fdr", sort.by="B", number=250))
TRUE 
1500 

table(topTable(fit2, coef = 'G3 - G2', adjust="fdr", sort.by="B", number=250) ==  topTable(fit2, coef = 4, adjust="fdr", sort.by="B", number=250))
TRUE 
1500
ADD REPLY
0
Entering edit mode

Be very sure that this command is doing exactly what you believe. I would check multiple times:

gsms <- paste0("00000000111111111111111111111111111111122222233333","333333333333333333333333333")
ADD REPLY
0
Entering edit mode

Yes! It finally works.

Thanks very much Kevin for your quick responses. Much appreciated.

ADD REPLY
0
Entering edit mode

Great work, ishackm

ADD REPLY

Login before adding your answer.

Traffic: 1309 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6