ggscatter, gene expression, correlation
Entering edit mode
17 months ago
Rob ▴ 160

Hey friends, The following code gives me a scatter plot of the correlation between gene expression (y-axis) & my continuous variable (x-axis). It gives me the R coefficient and p-value. I have two problems: 1) I need also to have a q-value (corrected p-value). what should I add to my code to have this? 2) I also want to rank the genes based on R and q-value? I mean I want the plots to be in the order of descending/ascending R & q-values. how is it possible?

ggscatter(ex, x = "variable", y = "Expression",
          add = "reg.line",                         
          add.params = list(color = "blue", fill = "lightgray"),
          color = "black", palette = "jco", fill = "lightgray",          
          #shape = "cyl",                            
          fullrange = TRUE,                       
          rug = TRUE, = "gene", cor.coef = T,
          title = "Correlation Plot",
 = TRUE, 
          cor.coeff.args = list(),
          cor.method = "spearman",
          cor.coef.coord = c(NULL, NULL),
          cor.coef.size = 4,                               

  geom_vline(xintercept = 157, colour="red", linetype = "longdash")

here is the plot I get (with R & p-value for each gene) :


RNA-Seq plot scatter • 1.1k views
Entering edit mode

I think you're reaching the limits of what ggscatter can do. I'd recommend calculating your own q vals, using pivot_longer, factor your gene names in whatever order you want, then switching to ggplot+geom_scatter. Maybe start with one gene until you optimize your plot then add in the facet_wrap.

Entering edit mode

Thanks @Trivas
What I understadn, I should calculate q-values first, then find R coefficients, then order the genes based on these two, then make the correlation scatter plots. right? as firs step, I tried to get q-value first using the following code:

data <- read.csv ("mydata.csv", = FALSE, stringsAsFactors = FALSE, header = T)

data_qvalue <- pivot_longer(data, cols = everything(-1)) ###all columns (all genes) except for column 1 that is my continuous variable

pvalues <- data_qvalue$p
qobj <- qvalue(p = pvalues)

my data looks like this: the first column (SMA) is my continuous variable.


I didn't get any result.

Entering edit mode

Two things:

  1. You are pivoting your data too early. I generally do this as the last step before plotting.
  2. You are better off transposing your data using t() then calculating your qvalues rowwise, eg
data %>% 
    t() %>% 
    tibble() %>%
    rowwise() %>% 
    mutate(pval = t.test(1:nrow(data))) %>%
    mutate(qval = qvalue(pval))

You might have to do some optimization after transposing (your colnames might become their own column so use column_to_rownames as an example) so I'd recommend running each line individually so you understand what each is doing.

You're shooting for a dataframe that has 6 columns after pivoting:

  1. gene name
  2. x coord
  3. y coord
  4. R value
  5. p-val
  6. q-val

The last 3 are going to be redundant within each gene, but ggplot knows how to handle that.

Entering edit mode

Thank you @Trivas I tried doing this but I got this error:

Error in mutate(): ! Problem while computing pval = t.test(1:nrow(data)). x pval must be a vector, not a htest object. i Did you mean: pval = list(t.test(1:nrow(data))) ? i The error occurred in row 1. Run rlang::last_error() to see where the error occurred.

then I change the line 5 to :

 mutate(pval = list(t.test(1:nrow(data)))) %>%

and I got this error:

Error in `mutate()`:
! Problem while computing `qval = qvalue(pval)`.
i The error occurred in row 1.
Caused by error in `min()`:
! invalid 'type' (list) of argument
Run `rlang::last_error()` to see where the error occurred.

is there any way to get rid of this error?

Entering edit mode

This is where I'm returning the reigns to you. At this point you have your data in the correct orientation to perform rowwise calculations. Once finished, you have the framework to alter your data into a format that ggplot2 likes (via pivot_longer) and you know how to factor your genes into the order you like. I've also given you a suggestion to optimize your code for a single gene then expand it to work for all of your genes.

The error you're getting now is that the results of t.test and qvalue aren't returning a singular numeric value and mutate cannot add a list of values into a cell of a dataframe. You can get around that by doing something like t.test() %>% .[["p.value"]]. The . acts as a placeholder for the list and the double brackets are similar to using $ to access an object of a list. In other words, this is equivalent to doing the following:

pval_res <- t.test()
pval <- pval_res$p.value

I implore you to think about what you're trying to do because the default function of t.test is a one sample t test which might not be super meaningful for your analysis.


Login before adding your answer.

Traffic: 2044 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6