Question

Function Of R Package Cummerbund

0

Entering edit mode

10.9 years ago

serena Meng ▴ 20

I have a project about RNA-seq. And I analyze the difference expression using cuffdiff.I have done the global statistics by R package CummeRbund.But I am puzzled about some of its funciton.

1,csboxplot

 library("cummeRbund", lib.loc="D:/Program Files/R/R-3.0.0/library")
    setwd("F:/")
    cuff_data<- readCufflinks("AvsB")
    cuff_data
    CuffSet instance with:
         2 samples
         2708 genes
         2723 isoforms
         0 TSS
         0 CDS
         0 promoters
         0 splicing
         0 relCDS
    csBoxplot(genes(cuff_data))

Now i got the figure as follows:

I reference the document "cummeRbund Visualization and Exploration of.pdf",But I find my figure is an exception.The legend of figure at reference is sample_name but condition for my figure.Beside ,I want to know the median, 1st quarter, 3rd quarter, the number of outlier of the boxplot, and I don't know how to get these .

reference figure:

2,csDensity

csDensity(genes(cuff_data))

my figure:

reference figure:

This function has the same problem with the function csboxplot.Also it's starts from less than 0,but the figure in reference is starts from 0 and has a peak by the 0.

By the way my R version is 3.0.1.

Thank you!

cuffdiff cummerbund r • 6.0k views

ADD COMMENT • link updated 10.4 years ago by Biostar 20 • written 10.9 years ago by serena Meng ▴ 20

8

Entering edit mode

EDIT: thanks for trying to improve the question.

This is one of the most poorly-written questions I have ever seen. Let me help you to improve it.

First, try to avoid "many questions." Try to identify a general theme which covers your problem. In your case, I'd suggest that you simply have not yet learned to use cummerbund properly. An informative title helps people understand your problem. "Something about cummerbund" is not a good title.

Second, link to examples. We don't know what a density image is, what you think it should look like, what yours looks like or how it differs to the image generated "by someone".

Third, avoid phrases like "and so on." What, precisely do you want to calculate and why, precisely, are you having difficulty? The last part of the question just sounds as though you want someone to tell you how to do more or less everything.

Now: please try to improve this question, or it will be closed/deleted.

ADD REPLY • link 10.9 years ago by Neilfws 49k

2

Entering edit mode

I've modified my question.I will appreciate your comments.

ADD REPLY • link 10.9 years ago by serena Meng ▴ 20

1

Entering edit mode

If you are not logged in at weibo, you cannot see the figures.

ADD REPLY • link 10.9 years ago by skymningen ▴ 330

score 3 · Answer 1 · 2013-06-03

3

Entering edit mode

10.9 years ago

Michael 54k

The differences you are seeing are all minor and caused by differences in the data. I can answer them without ever having used the R package (disclaimer): I'll summarize your question:

Differences in the figure legends ('sample_name' vs. "condition"): From my point of view, this is a very minor issue, in the worst case the labels could be changed in the graphics using a graphics software.
It might depend on how your data was labelled internally, it could also be due to a mismatch between documentation and implementation. The exact naming can be changed at many different stages. See the help for the plot commands you are using ?csBoxplot if it has controls to set the sample legend.
getting median and quartiles: In the box plot look for the horizontal lines of the boxes. The center box is the median, the edges the upper and lower quartile. If you want to get the exact values, you may consider applying the R-function quantile to the RPKM data, see ?quantile.
FPKM values < 0: This is expected for logarithmic values, note that the figure legend states log10(fpkm) not fpkm. Sor for any fpkm < 1, the log will be < 0. In the reference graph, possibly, a fixed cutoff was used.

General comment: according to a recent paper in Briefings in Bioinformatics, the use of FPKM/RPKM for differential expression analysis should be avoided.

ADD COMMENT • link 10.9 years ago by Michael 54k

0

Entering edit mode

Hi Michael,

In light of the paper you link to(thanks), what you think are the implications for the tuxedo pipeline(TopHat/Cufflinks)? If I'm not mistaken Cufflinks uses FPKM values to determine differential expression.

Thanks, Carlos

ADD REPLY • link 10.9 years ago by Carlos Borroto ★ 2.1k

0

Entering edit mode

Hi Carlos, I also should have cited the cufflinks 2 paper (Trapnell et al., http://www.nature.com/nbt/journal/v31/n1/full/nbt.2450.html). If you compare conclusions of both papers, it appears as if they are totally opposite. However as I understand it, Dillies et al. refer to DE analysis on the gene level while cufflinks is for DE analysis at the transcript level. Dillies et al. note:

Normalization and differential analysis at the transcript level require the use of sophisticated statistical models such as Cufflinks [5] or RSEM [45] in order to estimate, rather than count, expression levels of these transcripts. These estimates do not have the same statistical properties as read counts and may not be described by the same models or processed by the same normalization algorithms.

If my interpretation is correct, and there might be a legion of different opinion, if I assume both papers conclusions hold, I would use DEseq or edgeR for gene level DE, and cufflinks 2 for transcript level DE. At least until more comparative studies are available which claim the complete opposite.

ADD REPLY • link 10.9 years ago by Michael 54k

0

Entering edit mode

Thanks for your insight, quite interesting subject.

ADD REPLY • link 10.9 years ago by Carlos Borroto ★ 2.1k

0

Entering edit mode

Thank you for your answer. I have found some reasons. 1.Differences in the figure legends ,it just because of the vision of R.when I call package CummeRbund in R 2.14.1,I can get the image with legend sample_name like it in reference.But when I call that in R 3.0.1,the legend will named as condition.It have more other improgress about package CummeRbund .such as function csBoxplot.when we call csBoxplot in R2.14.1,it just consider the gene whose fpkm value great than zero but in R 3.0.1,It consider all genes and add 0.0001 whene the fpkm equals zero.this is another change.But I don't know why this change produce.

2,For the function csDensity,we can increase the parameter settings get the figure like it in reference.example: csDensity(genes(cuff), logMode=TRUE, pseudocount=1, features=FALSE, replicates=FALSE)

ADD REPLY • link 10.9 years ago by serena Meng ▴ 20