Question: Transform log2 fold changes in z-scores
0
gravatar for gg
4 weeks ago by
gg0
United Kingdom
gg0 wrote:

Hello everyone!

I have a dataset consisting of CRISPR gRNA read counts coming from two different samples (something very similar to a RNA-seq experiment output).

I have transformed the read counts in log2 values and computed fold change between treated and non-treated sample. The distribution of the data is normal, but the mean is not = 0. I am plotting these data in a volcano plot, and the plot doesn't look right as I am plotting depleted vs enriched gRNAs but they do not correspond to negative vs positive values. So I thought to transform the values in z-scores. I wonder if this is correct. I have seen it is common to do that for microarray data, but I am not completely sure this applies to my data.

Many thanks for your help!

Giovanna

See below the plot:

Rplot

rna-seq • 191 views
ADD COMMENTlink modified 4 weeks ago by The160 • written 4 weeks ago by gg0

check scale() in R

ADD REPLYlink written 4 weeks ago by Nicolas Rosewick8.1k

Indeed that's what I did, does not the function scale() transform the data in z-scores? I still wonder if this is statistically correct...

ADD REPLYlink written 4 weeks ago by gg0

could you add the volcano plot ?

ADD REPLYlink written 4 weeks ago by Nicolas Rosewick8.1k

Rplot

ADD REPLYlink modified 4 weeks ago by RamRS24k • written 4 weeks ago by gg0

It is unclear what you want gg. Why do you want to make z-scores of log2 fold changes? You also have p-values, how did you calculate them? What is wrong with the volcanoplot using log2 FC instead of z-scores? What do you want to do with the z-scores?

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Benn7.7k

Hello, thanks both again for helping me. When I use FC values the plot looks like this (see below), which I find much more difficult to interpret, especially when I need to plot vertical lines to indicate FC-based thresholds. My aim is to compare two different approaches of analysis. In the specific, setting thresholds according to negative control distribution (the vertical lines of above) and using p-values calculated by rank product analysis (the 0.05 horizontal line). Hope it is clearer now!

Rplot01
sito per caricare foto online

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by gg0
1

It is not clear how you calculate p-values. Why use Rank method, and where is the FDR correction? Neither is it clear how you have calculated log2 FC, they seem weird if I see your volcanoplot.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Benn7.7k

At first, the read counts were transformed in log2 values. The fold change decrease between treated and untreated control samples was calculated as described in the Equation below: gRNA C_score=[log_2⁡(gRNA abundance treated sample)-log_2⁡(gRNA abundance non treated sample) ] As each gene was targeted by 6 different gRNAs, the mean gRNA abundance of each gene was calculated using the Equation below: 〖gene C〗_score= □((∑▒〖gRNA C〗_score )/n gRNA) Finally, Rank product analysis was performed, using the FC of each gRNA targeting the same gene as a replicate.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by gg0

Okay thanks for explanation, I am not sure why you would use this protocol instead of edgeR for example. Try edgeR and see if you still have this weird shift of log2 FC towards the -1.

ADD REPLYlink written 4 weeks ago by Benn7.7k

why do you consider your result as difficult to interpret? it looks pretty good, it looks like you will not find significant differences between your 2 conditions but it looks like it has been well analyzed. The only concern is that I recommend you to plot the -log10 of padjusted value instead of pvalue to get the real significant expression values.

ADD REPLYlink written 4 weeks ago by Buffo1.7k
0
gravatar for The
4 weeks ago by
The160
United States
The160 wrote:

What I believe there is not sufficient 'scatter' in the plot. In your case the p-value is usually better(lower) with increase in absolute(fold change) almost monotonically . That might have to do something with the calculation of p-value in Rank Product analysis( do they still calculate it by random permutation, or introduced any exact method?) , or because of small number of samples or use of technical replicates as samples(pseudo replication).

I would suggest check some papers which used rank product and how the volcano plot looks like in those examples

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by The160
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 692 users visited in the last hour