Question: Cummerbund Volcano plot infinite value
1
gravatar for niu.shengyong
2.5 years ago by
niu.shengyong50 wrote:

I'm faced with a problem in the volcano plot of CummeRbund. As shown in enclosed file, you could see that there are plenty of infinite dots on the borders and also on the bottom. Is there any method to avoid this problem? Volcano plot

I've checked some pages, it doesn't work when I put "pseudocount=0.0001" argument in csVolcano function. Do you have any advice? Thanks!

Sincerely, Simon

rna-seq cummerbund • 1.6k views
ADD COMMENTlink modified 2.1 years ago by matine0 • written 2.5 years ago by niu.shengyong50
1

Can you show the code where you add the pseudocount? It is usually +1.0 and must be added before log normalising function.

Also print the matrix created and extract the infinite values --> trace those values back to the original input values before normalising --> maybe the input values are "NA" or something weird.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by YaGalbi1.4k
2

you can create the plot on your own. Just pull put the adj.Pval and the log2FC . Pull the values and plot the distribution first and see the value limits. Then it should not be a problem. Otherwise, you can always inflate with a constant counter for representation. use something like below. You have NA values for p.adjusted. So that's the problem. if it's a visualization you might not need them as well. Get rid of them and plot.

with(df, plot(log2FoldChange, -log10(padj), pch=20, main="Volcano plot",))
# Add colored points:
with(subset(df, padj <0.01 ), points(log2FoldChange, -log10(padj), pch=20, col="blue"))
with(subset(df, abs(log2FoldChange)>1.5), points(log2FoldChange, -log10(padj), pch=20, col="orange"))
with(subset(df, padj <0.01 & abs(log2FoldChange)>1.5), points(log2FoldChange, -log10(padj), pch=20, col="grey60"))
ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by ivivek_ngs4.8k
2

First: extract the ID, adj.Pval and log2FC from cuff_data into a dataframe called "df"

Second: remove all lines where log2FC=NA

Third: follow the code vchris posted here.

ADD REPLYlink written 2.5 years ago by YaGalbi1.4k

Should I use the file "gene_exp.diff" and extract the columns of "gene_id", "gene", "sample1", "sample2", "log2(fold_change)", and " p_value"? Thanks!

ADD REPLYlink written 2.5 years ago by niu.shengyong50
1

Yes gene_exp.diff provides the gene differential FPKM after cufflinks tests differences in the summed FPKM of transcripts sharing each gene_id.

Take a look here for the structure of the gene_exp.diff file.

To reliably extract what you need requires knowledge of a programming language (perl, python) or terminal commands (cut -f). If you do not have that ability, then you can use a spreadsheet application like Excel but prepare to get EXCELLED!!

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by YaGalbi1.4k

oh that's some handy code there.

ADD REPLYlink written 2.5 years ago by YaGalbi1.4k

I could delete infinite points successfully, but all the them become in grey color in the end. How could I solve this? Here is my codes:

table <-read.table("diff_list_subset.diff", header = TRUE)
df <- data.frame(table)
df <-df[Reduce('&', lapply(df, is.finite)),]
with(df, plot(df$log2.fold_change., -log10(df$p_value), pch=20, main="Volcano plot",))
with(subset(df, df$p_value < 0.01 ), points(df$log2.fold_change., -log10(df$p_value), 
pch=20, col="blue"))
with(subset(df, abs(df$log2.fold_change.)>1.5), points(df$log2.fold_change., -
log10(df$p_value), pch=20, col="orange"))
with(subset(df, df$p_value < 0.01 & abs(df$log2.fold_change.)>1.5), 
points(df$log2.fold_change., -log10(df$p_value), pch=20, col="grey60"))
ADD REPLYlink written 2.5 years ago by niu.shengyong50

Here is the plot: volcano plot

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by niu.shengyong50

Sorry I'm pretty new in this area. How could I extract the ID, adj.Pval and log2FC from cuff_data into a dataframe called "df", and also remove all lines where log2FC=NA? Appreciate it!

Simon

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by niu.shengyong50

Here is my code:

cuff_data <- readCufflinks('diff_out/')
pdf(file="Volcano.pdf")
csVolcano(genes(cuff_data), 'S63', 'S58',showSignificant=TRUE, alpha=0.05, features=TRUE,   xlimits=c(-40,40))
dev.off()

How could I add this before normalization and log function?

Thanks!

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by niu.shengyong50

Here is my code:

cuff_data <- readCufflinks('diff_out/')
pdf(file="Volcano.pdf")
csVolcano(genes(cuff_data), 'S63', 'S58',showSignificant=TRUE, alpha=0.05, features=TRUE,   
xlimits=c(-40,40))
dev.off()

How could I add this before normalization and log function?

Thanks!

ADD REPLYlink written 2.5 years ago by niu.shengyong50

Please also include the code where "pseudocount=0.0001" is added.

ADD REPLYlink written 2.5 years ago by YaGalbi1.4k
csVolcano(genes(cuff_data), 'S63', 'S58',showSignificant=TRUE, alpha=0.05, pseudocount=0.0001, features=TRUE,  xlimits=c(-40,40))

Thanks!

ADD REPLYlink written 2.5 years ago by niu.shengyong50
0
gravatar for niu.shengyong
2.5 years ago by
niu.shengyong50 wrote:

Is the padj value the same as q value? Thanks!

ADD COMMENTlink written 2.5 years ago by niu.shengyong50

yes..."p adjusted value" = p value adjusted for multiple testing = q value

ADD REPLYlink written 2.5 years ago by YaGalbi1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 866 users visited in the last hour