Question: Graphing specific gene expression data from DGEList
gravatar for MarjoryMollusc
8 months ago by
MarjoryMollusc40 wrote:

I have a large set of RNA-seq expression data in a DGEList object, and I want to plot the epxression data between two factors for specific genes temporally.

I started out by subsetting the data into a smaller matrix and then realised that was silly, and I should be able to plot it from the DGEList object that the data is stored in. Each timepoint has three replicates so I would also be looking to take a mean of those replicates before plotting. Would subsetting the data first still be the best option or am I missing a far quicker and easier option.

DGEList Count:

Gene Symbol   Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7 Sample8 ...
Gene1           54     55       53      78      79      74      81      82
Gene2           23     21       22      45      44      47      61      62     
Gene3           74     75       73      81      82      80      83      88
Gene4            2      3        1      10       9       8      12      11


       Sample Name ...    Day
[1,]    Sample1            D0
[2,]    Sample2            D0
[3,]    Sample3            D0
[4,]    Sample4            D3
[5,]    Sample5            D3
[6,]    Sample6            D3
[7,]    Sample7            D7
[8,]    Sample8            D7

Using the examples above, what I am trying to do is draw an expression line plot for Gene 2 and Gene 3, including averaging the expression levels on each day - but as I have two factors the samples come from two factors and so would need to be separate,

dgelist rna-seq edger ggplot2 • 500 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by MarjoryMollusc40

You may take inspiration from this previous question: Boxplot in ggplot2

Alternatively, if you have your data in this format:

             Group   BRCA1   TP53    ATM   CCND1
    Sample1  FactorX -       -       -     -
    Sample2  FactorX -       -       -     -
    Sample3  FactorY -       -       -     -
    Sample4  FactorX -       -       -     -
    Sample5  FactorY -       -       -     -
    ...      ...     ...     ...     ...   ...

... then, you can plot these with:

boxplot(BRCA1 ~ Group, data=MyData)
ADD REPLYlink written 8 months ago by Kevin Blighe33k

Thanks for the answer, but not quite what I'm after. I'll update the question above with example data

ADD REPLYlink written 8 months ago by MarjoryMollusc40
gravatar for Kevin Blighe
8 months ago by
Kevin Blighe33k
Republic of Ireland
Kevin Blighe33k wrote:

Okay, I get the feeling that this does not have to be anything special for now (in terms of a 'polished' plot). So, you could try this:

        time gene1 gene2 gene3
sample1 day1     1     2     3
sample2 day1     4    10     3
sample3 day2     1     2     3
sample4 day2     1     2     3
sample5 day3     1     2     3
sample6 day3     1     2     3

Summarise by mean:

df <- aggregate(df[,2:ncol(df)], df[1], mean)
  time gene1 gene2 gene3
1 day1   2.5     6     3
2 day2   1.0     2     3
3 day3   1.0     2     3


plot(1, type="n", ylab="Expression", xlab="Day (1, 2, 3)", xlim=c(1,3), ylim=c(0,10))
lines(gene1 ~ time, data=df, lwd=2, col="royalblue")
lines(gene2 ~ time, data=df, lwd=2, col="red4")
lines(gene3 ~ time, data=df, lwd=2, col="forestgreen")


ADD COMMENTlink written 8 months ago by Kevin Blighe33k

Ah awesome! That's exactly what I have been trying to get. Did not know about the aggregate function. Thanks so much! The idea is to get a simple plot of expression with nothing too complicated.

ADD REPLYlink written 8 months ago by MarjoryMollusc40

Okay, great!

The second argument that i've used for aggregate is a bit weird, though: df[1] indicates 'aggregate based on the first column. It does not follow the typical data-frame sub-setting

ADD REPLYlink modified 8 months ago • written 8 months ago by Kevin Blighe33k

Looks like GROUP BY:

SELECT AVG(df[,2]), AVG(df[,3]), AVG(df[,4]), ...., AVG(df[,ncol(df)]) GROUP BY df([,1])

So, apply a specified aggregate over a vector of vectors, partitioning/grouping each vector based on unique values from a different, equal-length vector.

ADD REPLYlink modified 8 months ago • written 8 months ago by RamRS19k

I managed to figure that out. Thanks again!

Any chance you could explain how ggplot2 could be used using the strings D0, D2, D4, D6 in column one from two dataframes to plot a similar plot as above, but with the discrete variables?

I managed to figure out how to get one line in, but it is also putting the x variable in a different order to that of the dataframe.

ADD REPLYlink modified 7 months ago • written 7 months ago by MarjoryMollusc40

@Ram, thanks! With both contributions I have managed to figure out the aggregation of the data

ADD REPLYlink written 7 months ago by MarjoryMollusc40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1229 users visited in the last hour