Question

Calculating the expression level of genes

1

Entering edit mode

5.9 years ago

Za ▴ 140

Hi,

I have seq data in some time points. How I can plot the expression level of genes (individually or together) so that the gene is expression more, less or equal to mean of expression level??? I though about something like heat map that colours before and after mean are lighter and darker respectively. But I don't know how to deal with this

Gene expression RNA-Seq R • 2.7k views

ADD COMMENT • link 5.9 years ago by Za ▴ 140

0

Entering edit mode

Some example data would help.

ADD REPLY • link 5.9 years ago by cpad0112 21k

0

Entering edit mode

head(a[,1:10])

Actually, I want to map the level of genes expression in each time point around the mean or central tendencies

ADD REPLY • link 5.9 years ago by Za ▴ 140

2

Entering edit mode

some thing like this?

ADD REPLY • link 5.9 years ago by cpad0112 21k

0

Entering edit mode

Oh thank you. I think yes.

ADD REPLY • link 5.9 years ago by Za ▴ 140

1

Entering edit mode

In above image:

Each box represents time point (time 1, 2 and 3 consecutively)
Each box has expression value on Y-axis and Samples on X-axis.
Each dot represents an expression value. Expression value above mean (for that sample) are represented by red and below mean are cyan.
The black line in each box touches all the means of samples within the time point. It is not over all mean, for all the samples in the time point. I am not sure of %CV here. I used mean as cut off for color.

In OP data, most of them are zeros or near zero. I create a data frame. If you could post following information that would be great:

Is each cell and sample are same? for eg there are about 200 cells. Is each cell is the sample it self or in each cell, there are multiple samples?
Are these cells same across time points?
If possible, post non-zero values for few of the samples and few of the time points to get an idea.

ADD REPLY • link 5.9 years ago by cpad0112 21k

0

Entering edit mode

data$mpg_type <- ifelse(data$mpg_z < mean "below", "above")  # above / below avg flag
data <- data[order(data$mpg_z), ]  # sort
data$`Gene` <- factor(data$`Gene`, levels = data$`Gene`)  # convert to fa

ADD REPLY • link 5.9 years ago by Za ▴ 140

1

Entering edit mode

I simulated data frame with gene names DDB_G0267178 for 10 genes, time points for h10-16 and 12 samples (s1-12). Following are graphs using ggplot. Since you mentioned violin polots, I tried recreating violin plots. Unfortunately, simulated data set is small. Hence When I faceted, the violins are out of shape. Hence I included image without faceting. I included both mean and mean+se. Points are colored based on the mean. The colors can be customized. Change the code as you wish. Images are at the end:

Data preparation and per sample faceting code

## Simulate data frame with 10 gene names and create a matrix of 10rx 12 c expression data
df1=data.frame(genes = paste0("DDB_G0267",sample(c(178:200), 10, replace = F)), matrix(sample(round(rnorm( 120, 4, 4), 2)), 10, 12))
## Change the names of the columns
colnames(df1)[-1] = paste0("sample", seq(1, 12))

## create a data frame for 12 sample names and 4 time points
pheno= data.frame(samples = paste0("sample", seq(1, 12)),time = rep(c("h10","h12","h14","h16"), each = 3))

## Append expression means of each sample to the pheno data
samples=data.frame(samples=colnames(df1[-1]), sample_means=colMeans(df1[-1]), stringsAsFactors = F, time=pheno$time)
geno=data.frame(genes=df1[1],gene_means=rowMeans(df1[-1]), stringsAsFactors = F)

## Load library tidyr
library(tidyr)

## convert wide format to long format
df2=gather(df1,"samples","expression",-"genes")

## Merge the metadata with data
df3=merge(df2,samples, by="samples")
df4=merge(df3,geno, by="genes")
df5=df4 %>% group_by(genes,time) %>% mutate(gene_time_mean=mean(expression))

## Plot using ggplot2. There are 3 graphs below. First one is with faceting for each  time point
## Faceting per sample with mean

s1=ggplot(df3, aes(samples, expression))+
    geom_violin()+
    geom_jitter(aes(color=expression>sample_means), width = 0.2)+
    scale_colour_manual(values=c("darkgreen", "red"))+
    stat_summary(fun.y=mean, geom="point", aes(group=1), size=1)+
    stat_summary(color="black",  geom = "line", aes(group=1))+
    facet_grid(~time, scales = "free")+
    theme_bw()+
    theme(
        legend.position = "none",
        strip.text.x = element_text(size = 24),
        strip.background = element_blank(),
        axis.text.x = element_text(size = 14, angle=90),
        axis.text.y = element_text(size = 24),
        axis.title.x = element_blank(),
        axis.title.y = element_blank()
        )

## without faceting per sample with mean
s2=ggplot(df3, aes(samples, expression))+
    geom_violin()+
    geom_jitter(aes(color=expression>sample_means), width = 0.2)+
    scale_colour_manual(values=c("darkgreen", "red"))+
    stat_summary(fun.y=mean, geom="point", aes(group=1), size=1)+
    stat_summary(color="black",  geom = "line", aes(group=1))+
    theme_bw()+
    theme(
        legend.position = "none",
        strip.text.x = element_text(size = 24),
        strip.background = element_blank(),
        axis.text.x = element_text(size = 14, angle=90),
        axis.text.y = element_text(size = 24),
        axis.title.x = element_blank(),
        axis.title.y = element_blank()
    )

## without faceting per sample with mean and se
s3=ggplot(df3, aes(samples, expression))+
    geom_violin()+
    geom_jitter(aes(color=expression>sample_means), width = 0.2)+
    scale_colour_manual(values=c("darkgreen", "red"))+
    stat_summary(fun.y=mean, geom="point", aes(group=1), size=1)+
    stat_summary(color="black",  geom = "smooth", aes(group=1))+
    theme_bw()+
    theme(
        legend.position = "none",
        strip.text.x = element_text(size = 24),
        strip.background = element_blank(),
        axis.text.x = element_text(size = 14, angle=90),
        axis.text.y = element_text(size = 24),
        axis.title.x = element_blank(),
        axis.title.y = element_blank()
    )

ADD REPLY • link 5.9 years ago by cpad0112 21k

1

Entering edit mode

...continuation of data preparation and sample faceting code post...

## with faceting, gene expression data with mean
g1=ggplot(df5, aes(genes, expression))+
    geom_violin()+
    geom_jitter(width = 0.2, aes(color=expression>gene_time_mean))+
    scale_colour_manual(values=c("darkgreen", "red"))+
    stat_summary(fun.y=mean, geom="point", aes(group=1), size=1)+
    stat_summary(color="steelblue",  geom = "line", aes(group=1), size=1.2)+
    theme_bw()+
        theme(
        legend.position = "none",
        strip.text.x = element_text(size = 24),
        strip.background = element_blank(),
        axis.text.x = element_text(size = 14, angle=90),
        axis.text.y = element_text(size = 24),
        axis.title.x = element_blank(),
        axis.title.y = element_blank()
    )+
    facet_grid(~time)

## without faceting, gene expression data with mean only
g2=ggplot(df5, aes(genes, expression))+
    geom_violin()+
    geom_jitter(width = 0.2, aes(color=expression>gene_means))+
    scale_colour_manual(values=c("darkgreen", "red"))+
    stat_summary(fun.y=mean, geom="point", aes(group=1), size=1)+
    stat_summary(color="steelblue",  geom = "line", aes(group=1), size=1.2)+
    theme_bw()+
    theme(
        legend.position = "none",
        strip.text.x = element_text(size = 24),
        strip.background = element_blank(),
        axis.text.x = element_text(size = 14, angle=90),
        axis.text.y = element_text(size = 24),
        axis.title.x = element_blank(),
        axis.title.y = element_blank()
    )

## without faceting, gene expression data with mean + se

g3=ggplot(df5, aes(genes, expression))+
    geom_violin()+
    geom_jitter(width = 0.2, aes(color=expression>gene_means))+
    scale_colour_manual(values=c("darkgreen", "red"))+
    stat_summary(fun.y=mean, geom="point", aes(group=1), size=1)+
    stat_summary(color="steelblue",  geom = "smooth", aes(group=1), size=1.2)+
    theme_bw()+
    theme(
        legend.position = "none",
        strip.text.x = element_text(size = 24),
        strip.background = element_blank(),
        axis.text.x = element_text(size = 14, angle=90),
        axis.text.y = element_text(size = 24),
        axis.title.x = element_blank(),
        axis.title.y = element_blank()
    )

library(gridExtra)
grid.arrange(s1,s2,s3,g1,g2,g3, nrow = 2, ncol=3)

ADD REPLY • link 5.9 years ago by cpad0112 21k

0

Entering edit mode

Thanks a lot

This is a link of this data

209, 153, 179 and 205 samples for each time point respectively.

ADD REPLY • link 5.9 years ago by Za ▴ 140

0

Entering edit mode

Sorry, your last comment about heat map was vanished. I plotted for two genes and one time point as picture. So, only few cells (x axis) are expressing these two genes less than mean in this time point

ADD REPLY • link 5.9 years ago by Za ▴ 140

1

Entering edit mode

I replaced previous heatmap pic with new one with code. There was no other text other than image. Image you have attached seems to be too busy and mean for many of them is incorrect. For mean to be in the middle of the graph, it should almost equal number of points above and below it. Please look into your code and data again.

ADD REPLY • link 5.9 years ago by cpad0112 21k

0

Entering edit mode

Sorry, you are plotting 10 genes for each sample. I just plotted one gene per sample (samples are my cells, so that I have 209 samples (cells) for time point h16). If I plot more than a gene, I am not able to see which of them in a sample is expressiong above or below the mean.

ADD REPLY • link 5.9 years ago by Za ▴ 140

1

Entering edit mode

I agree with you that simulated data is small and uniform. In general, real life big data (such as *omics, BFSI) is complicated and one needs several tricks to make sense of data. I can only help you with simulated data as my english is poor and cannot understand the description of data. I rather would like to see the data. I have updated the code with per gene, per sample tiling for time points. Please go through the code. Do not emulate it, rather see how a final dataframe is created for plotting. Then start plotting.

For other two points that are raised (gene expression and sample number), first issue is rather a statistical issue and second one is a non-issue.

ADD REPLY • link 5.9 years ago by cpad0112 21k

0

Entering edit mode

Excuse me,

I have cells for one time point and the expression of one gene for these cells ,

ADD REPLY • link 5.9 years ago by Za ▴ 140