Question: Cummerbund Volcano plot infinite value
0
gravatar for niu.shengyong
22 months ago by
niu.shengyong30 wrote:

I'm faced with a problem in the volcano plot of CummeRbund. As shown in enclosed file, you could see that there are plenty of infinite dots on the borders and also on the bottom. Is there any method to avoid this problem? Volcano plot

I've checked some pages, it doesn't work when I put "pseudocount=0.0001" argument in csVolcano function. Do you have any advice? Thanks!

Sincerely, Simon

rna-seq cummerbund • 1.2k views
ADD COMMENTlink modified 17 months ago by matine0 • written 22 months ago by niu.shengyong30
1

Can you show the code where you add the pseudocount? It is usually +1.0 and must be added before log normalising function.

Also print the matrix created and extract the infinite values --> trace those values back to the original input values before normalising --> maybe the input values are "NA" or something weird.

ADD REPLYlink modified 22 months ago • written 22 months ago by YaGalbi1.4k
2

you can create the plot on your own. Just pull put the adj.Pval and the log2FC . Pull the values and plot the distribution first and see the value limits. Then it should not be a problem. Otherwise, you can always inflate with a constant counter for representation. use something like below. You have NA values for p.adjusted. So that's the problem. if it's a visualization you might not need them as well. Get rid of them and plot.

with(df, plot(log2FoldChange, -log10(padj), pch=20, main="Volcano plot",))
# Add colored points:
with(subset(df, padj <0.01 ), points(log2FoldChange, -log10(padj), pch=20, col="blue"))
with(subset(df, abs(log2FoldChange)>1.5), points(log2FoldChange, -log10(padj), pch=20, col="orange"))
with(subset(df, padj <0.01 & abs(log2FoldChange)>1.5), points(log2FoldChange, -log10(padj), pch=20, col="grey60"))
ADD REPLYlink modified 22 months ago • written 22 months ago by ivivek_ngs4.7k
2

First: extract the ID, adj.Pval and log2FC from cuff_data into a dataframe called "df"

Second: remove all lines where log2FC=NA

Third: follow the code vchris posted here.

ADD REPLYlink written 22 months ago by YaGalbi1.4k

Should I use the file "gene_exp.diff" and extract the columns of "gene_id", "gene", "sample1", "sample2", "log2(fold_change)", and " p_value"? Thanks!

ADD REPLYlink written 22 months ago by niu.shengyong30
1

Yes gene_exp.diff provides the gene differential FPKM after cufflinks tests differences in the summed FPKM of transcripts sharing each gene_id.

Take a look here for the structure of the gene_exp.diff file.

To reliably extract what you need requires knowledge of a programming language (perl, python) or terminal commands (cut -f). If you do not have that ability, then you can use a spreadsheet application like Excel but prepare to get EXCELLED!!

ADD REPLYlink modified 22 months ago • written 22 months ago by YaGalbi1.4k

oh that's some handy code there.

ADD REPLYlink written 22 months ago by YaGalbi1.4k

I could delete infinite points successfully, but all the them become in grey color in the end. How could I solve this? Here is my codes:

table <-read.table("diff_list_subset.diff", header = TRUE)
df <- data.frame(table)
df <-df[Reduce('&', lapply(df, is.finite)),]
with(df, plot(df$log2.fold_change., -log10(df$p_value), pch=20, main="Volcano plot",))
with(subset(df, df$p_value < 0.01 ), points(df$log2.fold_change., -log10(df$p_value), 
pch=20, col="blue"))
with(subset(df, abs(df$log2.fold_change.)>1.5), points(df$log2.fold_change., -
log10(df$p_value), pch=20, col="orange"))
with(subset(df, df$p_value < 0.01 & abs(df$log2.fold_change.)>1.5), 
points(df$log2.fold_change., -log10(df$p_value), pch=20, col="grey60"))
ADD REPLYlink written 21 months ago by niu.shengyong30

Here is the plot: volcano plot

ADD REPLYlink modified 21 months ago • written 21 months ago by niu.shengyong30

Sorry I'm pretty new in this area. How could I extract the ID, adj.Pval and log2FC from cuff_data into a dataframe called "df", and also remove all lines where log2FC=NA? Appreciate it!

Simon

ADD REPLYlink modified 22 months ago • written 22 months ago by niu.shengyong30

Here is my code:

cuff_data <- readCufflinks('diff_out/')
pdf(file="Volcano.pdf")
csVolcano(genes(cuff_data), 'S63', 'S58',showSignificant=TRUE, alpha=0.05, features=TRUE,   xlimits=c(-40,40))
dev.off()

How could I add this before normalization and log function?

Thanks!

ADD REPLYlink modified 22 months ago • written 22 months ago by niu.shengyong30

Here is my code:

cuff_data <- readCufflinks('diff_out/')
pdf(file="Volcano.pdf")
csVolcano(genes(cuff_data), 'S63', 'S58',showSignificant=TRUE, alpha=0.05, features=TRUE,   
xlimits=c(-40,40))
dev.off()

How could I add this before normalization and log function?

Thanks!

ADD REPLYlink written 22 months ago by niu.shengyong30

Please also include the code where "pseudocount=0.0001" is added.

ADD REPLYlink written 22 months ago by YaGalbi1.4k
csVolcano(genes(cuff_data), 'S63', 'S58',showSignificant=TRUE, alpha=0.05, pseudocount=0.0001, features=TRUE,  xlimits=c(-40,40))

Thanks!

ADD REPLYlink written 22 months ago by niu.shengyong30
0
gravatar for niu.shengyong
21 months ago by
niu.shengyong30 wrote:

Is the padj value the same as q value? Thanks!

ADD COMMENTlink written 21 months ago by niu.shengyong30

yes..."p adjusted value" = p value adjusted for multiple testing = q value

ADD REPLYlink written 21 months ago by YaGalbi1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 944 users visited in the last hour