Question: Plotting Exon Statistics In R
3
7.7 years ago by
United States
thecuriousbiologist480 wrote:

Hi,

I have a table in the following format which provides some information (counts of mapped reads) about exons in some genes across some samples.

``````Exon     Gene      sampleA      sampleB     sampleC
E1       A         43          52          12
E2       A         0           24          34
E3       A         19          48          32
E4       A         76          0           23
E5       A         5           87          12
E1       B         12          109         98
E2       B         32          76          11
E1       C         12          0            5
E2       C          4          8           76
E3       C          0          0           32
``````

That is, information about every exon of every gene. I wish to generate a per sample plot (therefore, 3 plots) of counts of all exons in all 3 genes. Within each sample plot, my X-axis would be the exon number and the Y-axis would be the count. And so, I would have 3 "data series" lines (since there are 3 genes) within each plot.

I am new to R and I have no clue how to go about it.

I am wondering if I have to "factor" the gene column in any way to get the exons specific for that gene.?

Any suggestions would be much appreciated.

R exon • 3.3k views
modified 7.7 years ago by Dan520 • written 7.7 years ago by thecuriousbiologist480

Don't forget to normalize your read counts by sequence depth per sample.

Also, if you want to run statistics on differential exon usage (seems to be where you are going with this), you should look at the DEXSeq package ... an added bonus is that it includes functionality to plot expression over exons

4
7.7 years ago by
Irsan7.2k
Amsterdam
Irsan7.2k wrote:
``````# Install ggplot2 and reshape
install.packages(c("ggplot2","reshape"))

library(reshape)
library(ggplot2)

# melt the dataframe so that ggplot can handle it. I assume you have the data in object called counts
melt_count<-melt(count,id.vars=c("Exon","Gene"))
colnames(melt_count)<-c("exon","gene","sample","count")

# and plot counts for each gene for each exon
ggplot(melt_count)+geom_point(aes(x=exon,y=count,color=sample))+facet_grid(exon ~ gene,scales="free_x")
``````

See resulting image here

1
7.7 years ago by
Dan520
Cambridge
Dan520 wrote:

If you're new to R, you should definitely read Chapter 1 (Introduction) of 'S Poetry': http://www.burns-stat.com/documents/books/s-poetry/

I can't recommend it enough!

For manipulating data frames, you should look at tapply and friends. I don't quite understand what you want to do, but I'm sure you can do it with tapply ;-)

2

If you're new to R, you should in general read as much documentation, including online tutorials as you can. You can't expect to have a "clue how to go about it" with no background knowledge whatsoever.