Question: Plotting Exon Statistics In R
5.4 years ago by
United States
thecuriousbiologist420 wrote:


I have a table in the following format which provides some information (counts of mapped reads) about exons in some genes across some samples.

Exon     Gene      sampleA      sampleB     sampleC
  E1       A         43          52          12   
  E2       A         0           24          34
  E3       A         19          48          32
  E4       A         76          0           23
  E5       A         5           87          12  
  E1       B         12          109         98
  E2       B         32          76          11
  E1       C         12          0            5
  E2       C          4          8           76 
  E3       C          0          0           32

That is, information about every exon of every gene. I wish to generate a per sample plot (therefore, 3 plots) of counts of all exons in all 3 genes. Within each sample plot, my X-axis would be the exon number and the Y-axis would be the count. And so, I would have 3 "data series" lines (since there are 3 genes) within each plot.

I am new to R and I have no clue how to go about it.

I am wondering if I have to "factor" the gene column in any way to get the exons specific for that gene.?

Any suggestions would be much appreciated.

R exon • 2.5k views
ADD COMMENTlink modified 5.4 years ago by Dan480 • written 5.4 years ago by thecuriousbiologist420

Don't forget to normalize your read counts by sequence depth per sample.

Also, if you want to run statistics on differential exon usage (seems to be where you are going with this), you should look at the DEXSeq package ... an added bonus is that it includes functionality to plot expression over exons

ADD REPLYlink written 5.4 years ago by Steve Lianoglou4.9k
5.4 years ago by
Irsan6.6k wrote:
# Install ggplot2 and reshape

# load the packages:

# melt the dataframe so that ggplot can handle it. I assume you have the data in object called counts

# and plot counts for each gene for each exon
ggplot(melt_count)+geom_point(aes(x=exon,y=count,color=sample))+facet_grid(exon ~ gene,scales="free_x")

See resulting image here

ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by Irsan6.6k
5.4 years ago by
Dan480 wrote:

If you're new to R, you should definitely read Chapter 1 (Introduction) of 'S Poetry':

I can't recommend it enough!

For manipulating data frames, you should look at tapply and friends. I don't quite understand what you want to do, but I'm sure you can do it with tapply ;-)

ADD COMMENTlink written 5.4 years ago by Dan480

If you're new to R, you should in general read as much documentation, including online tutorials as you can. You can't expect to have a "clue how to go about it" with no background knowledge whatsoever.

ADD REPLYlink written 5.4 years ago by Neilfws48k
