Tutorial: CorLevelPlot - Visualise correlation results, e.g., clinical parameter correlations
5
gravatar for Kevin Blighe
3 months ago by
Kevin Blighe21k
University College London Cancer Institute
Kevin Blighe21k wrote:

Similar to how corrplot does it, but also different.

Cor_Level_Plot1_1
imageupload

Cor_Level_Plot2_1
upload picture

#Function:  CorLevelPlot
#Purpose:   Accepts a data matrix/frame and vectors of xy correlates, and generates/plots a correlation matrix with significances for these 
#Parameters:    dfClinParameters, Data-frame/matrix with correlates as column names
#       x, vector of correlates (column names) contained within dfClinParameters
#       y, ||   ||  ||
#       Size of the labels inside the plot area
#       strCorMethod, one of "pearson", "kendall", "spearman"
#       strPalette, one of RColorBrewer's palettes (see http://svitsrv25.epfl.ch/R-doc/library/RColorBrewer/html/ColorBrewer.html)
#       iNumColours, number of colours to use in the palette (if too many chosen, the max will automatically be chosen)
#       boolReverse, Reverse the palette?
#       boolPlotRSquared, plot R^2 values?
#       strTitle, plot title
#Requires:  lattice, latticeExtra, RColorBrewer
#Written by:    Kevin Blighe, finalised 10th October 2016
CorLevelPlot <- function(df, x, y, labcex, strCorMethod, strPalette, iNumColours, boolReverse, boolPlotRSquared, strTitle)
{
    #Required packages
    require(lattice)
    require(latticeExtra)
    require(RColorBrewer)

    #Check to see if everything is numeric; if not, print a warning message
    for (i in 1:length(x))
    {
        if(!is.numeric(df[,x[i]]))
        {
            print(paste("Warning: ", x[i], " is not numeric - please check the source data as everything will be converted to a matrix", sep=""))
        }
    }
    for (i in 1:length(y))
    {
        if(!is.numeric(df[,y[i]]))
        {
            print(paste("Warning: ", y[i], " is not numeric - please check the source data as everything will be converted to a matrix", sep=""))
        }
    }

    #Convert the data for x and y to data matrix
    #   NAs are left NA
    #   Character (A-Z a-z) are converted to NA
    #   Character numbers are converted to integers
    #   Factors re converted to numbers based on level ordering
    xvals <- data.matrix(df[,which(colnames(df) %in% x)])
    yvals <- data.matrix(df[,which(colnames(df) %in% y)])
    corvals <- cor(xvals, yvals, use="pairwise.complete.obs", method=strCorMethod)

    #Create a new df with same dimensions as corvals and fill with P values
    pvals <- corvals
    for (i in 1:ncol(xvals))
    {
        for (j in 1:ncol(yvals))
        {
            pvals[i,j] <- cor.test(xvals[,i], yvals[,j], use="pairwise.complete.obs", method=strCorMethod)$p.value
            colnames(pvals)[j] <- colnames(yvals)[j]
        }

        rownames(pvals)[i] <- colnames(xvals)[i]
    }

    #Are we plotting R^2 values?
    if (boolPlotRSquared==TRUE)
    {
        corvals <- corvals^2
    }

    #Determine max and min correlation values in order to define the range
    #If all values are low, the range will be smaller
    #This can be visually improved by also adjusting the colour scaling in leveplot() with 'at=seq(iLowerRange,iUpperRange,0.01)'
    max <- max(corvals)
    min <- min(corvals)
    if(abs(max)>abs(min))
    {
        iUpperRange <- max+0.05
        iLowerRange <- (max*(-1))-0.05
    }
    else
    {
        iUpperRange <- abs(min)+0.05
        iLowerRange <- min-0.05
    }
    if (boolPlotRSquared==TRUE)
    {
        iUpperRange <- max+0.1
        iLowerRange <- 0
    }

    #Define the colour scheme/palette
    if (boolReverse==TRUE)
    {
        cols <- colorRampPalette(rev(brewer.pal(iNumColours, strPalette)))
    }
    else
    {
        cols <- colorRampPalette(brewer.pal(iNumColours, strPalette))
    }

    #Create a new df with same dimensions as corvals and fill with significances encoded with asterisks
    signif <- corvals
    for (i in 1:ncol(pvals))
    {
        signif[,i] <- c(symnum(pvals[,i], corr=FALSE, na=FALSE, cutpoints=c(0, 0.001, 0.01, 0.05, 1), symbols=c("***", "**", "*", "")))
    }

    #Create a new df with same dimensions as corvals and fill with r values merged with the encoded significances
    plotLabels <- corvals
    for (i in 1:nrow(corvals))
    {
        for(j in 1:ncol(corvals))
        {
            plotLabels[i,j] <- paste(round(corvals[i,j],2), signif[i,j], sep="")
            colnames(plotLabels)[j] <- colnames(corvals)[j]
        }

        rownames(plotLabels)[i] <- rownames(corvals)[i]
    }

    #Define a panel function for adding labels
    #Labels are passed with z as a third dimension
    labels=function(x,y,z,...)
    {
        panel.levelplot(x,y,z,...)
        ltext(x, y, labels=plotLabels, cex=labcex, font=1)
        #panel.text(x, y, plotLabels[x,y], cex=labcex)
    }

    levelplot(data.matrix(corvals), panel=labels, xlab="", ylab="", pretty=TRUE, scales=list(x=list(rot=45)), aspect="fill", col.regions=cols, main=strTitle, cuts=100, at=seq(iLowerRange,iUpperRange,0.01))
}

Requires

  • lattice
  • latticeExtra
  • RColorBrewer

Execution

CorLevelPlot(df, x, y, labcex, strCorMethod, strPalette, iNumColours, boolReverse, boolPlotRSquared, strTitle)

Parameters

  • df, Data-frame/matrix with correlates as column names
  • x, vector of correlates (column names) contained within dfClinParameters
  • y, as above
  • labCex, Size of the labels inside the plot area
  • strCorMethod, one of "pearson", "kendall", "spearman"
  • strPalette, one of RColorBrewer's palettes (see http://svitsrv25.epfl.ch/R-doc/library/RColorBrewer/html/ColorBrewer.html)
  • iNumColours, number of colours to use in the palette (if too many chosen, the max will automatically be chosen)
  • boolReverse, Reverse the palette?
  • boolPlotRSquared, plot R^2 values?
  • strTitle, plot title
ADD COMMENTlink modified 3 months ago by svlachavas430 • written 3 months ago by Kevin Blighe21k

Dear Kevin,

it looks a great and very useful function for visualizing and testing important putative correlations among multilayered data. Three important questions on the implementation (and please excuse me for any naive questions on this matter):

1) Regarding the type of variables that one can include in the df object, and in the x and y vectors respectively:

For example,

data("iris")
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

if i have a relative data frame, with specific gene expressions in the columns as variables-like the Sepal Length from iris,regarding samples belonging in a specific cluster as you illustrated in the figure above-as also other clinical non-continuous variables, such as Tumor Grade, etc-like Species above:

i can still implement your function to compute putative associations of these continuous variables with any of the categorical ones? or my notion/approach is incorrect ?

2) If my approach is feasible, for a correct implementation, i should provide the names of the continuous variables in the x vector, and the names of the categorical ones in the y vector ?

3) Finally, a question concerning NA values: because in some of these categorical variables, like the following:

plyr::count(coad.exp$subtype_expression_subtype)
         x freq
1      CIN   55
2 Invasive   36
3 MSI/CIMP   58
4       NA   49
5     <NA>  258

which in this case, has both NA values, as also "NA" character strings in this categorical variable-i could still use the function, and it will utilize only the above levels like CIN, Invasive and MSI/CIMP ?

Thank you in advance,

Efstathios

ADD REPLYlink written 3 months ago by svlachavas430
1

Hi Efstathios,

1) Yes, categorical variables will be converted into ordered numerical factors (1, 2, 3, 4, etc), with the number 1 as the base / reference level. Thus, if you have factors, you may want to 'relevel' them prior to running this function.

2) Yes, that is the correct form to do it

3) If there is a NA vale, then the entire record will be removed due to the presence of the following parameter passed to cor.test in my function: cor.test(..., use="pairwise.complete.obs", ...)

Kind regards,

Kevin

ADD REPLYlink written 3 months ago by Kevin Blighe21k

Thanks Kevin for the confirmation and your suggestions !!! Just two extra important comments on this matter :

1) Conserning my first question of the categorical variables: because in some cases, for instance regardless of Tumor Stage which can be releveled (T1,T2 etc), some variables, like the subtype_expression_subtype i mentioned above, does not have a reference level-this is also due to the fact that in my analysis i have only cancer and not normal samples-thus your opinion about this ? i would just factorize the categorical variables i will include ?

2) Finally, concerning my third question about NAs: if you noticed, except actual NA values, i have also in some variables, "NA" character strings, like the below:

plyr::count(coad.exp$subtype_expression_subtype)
         x freq
1      CIN   55
2 Invasive   36
3 MSI/CIMP   58
4       **NA   49**
5     <NA>  258

thus, i should take care of them, or proceed as you suggested ?

ADD REPLYlink written 3 months ago by svlachavas430
1

Hi again,

For this, correlation, you do not require a true reference. You just need to ensure that you know how your categorical variables will be numerically encoded numerically. This encoding will b based on how you order them.

For example,

factor(tumourStage, levels=c("T1","T2","T3"))

This will be transformed into 1, 2, 3 for correlation.

factor(tumourStage, levels=c("T3","T2","T1"))

This will also be transformed into 1, 2, 3 for correlation.

----------------

On the other point, i recommend that you remove all extraneous and un-needed characters, such as **NA (convert to NA), leading and trailing spaces, etc.

ADD REPLYlink modified 3 months ago • written 3 months ago by Kevin Blighe21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 628 users visited in the last hour