**50**wrote:

Hi guys, Brand new to going through RNA-seq data, and learning on the go. I have been trying to make a volcano plot of some example data from GEO, to determine up-regulated genes in the sample. I am having problems with getting reasonable outputs for my fold-change values, with insanely high values over 5000 and below -10000. I am not sure if this is a result of just the data or something I have been doing myself. The reason for the volcano plot is so I can then determine the upregulated genes, usually done by looking at the genes over expressed (>=1).

I have been getting the average of the expression data for the "diabetic" and "non-diabetic". To determine the fold change I have then just subtracted the average of the diabetics from the average of the health. This is based off an example with a different dataset.

```
avg.t2d <- apply(exdat[,t2d.index==" diabetic"], 1, mean)
avg.healthy <- apply(exdat[,t2d.index==" non-diabetic"], 1, mean)
foldch <- avg.t2d - avg.healthy
logt2dp <- -log(t2dpvals, base=10)
plot(foldch, logt2dp, pch=1, xlab="Log2 Fold Change (L/S)",
ylab="-Log10(P-val)", main="Volcano Plot", col = ifelse(adj_pvals<0.05, 'green', 'black'))
```

Am I missing something big here or am I over thinking the problem?

**6.9k**• written 23 months ago by MarjoryMollusc •

**50**

Use log2 for exploring the fold changes. You could do something like

`log2(avg.t2d)-log2(avg.healthy)`

400Thanks for the reply! I have tried this already, however I had the warnings that NaNs were produced. Is there any way to cut the data that give that output? Or is that a result of errors somewhere else?

50The warning indicates that you have markers which have

`avg.t2d AND avg.healthy`

as 0. One way of handling this is adding a small pseudo count of eg. 0.0001 to all the mean values. This will ensure that there are no 0's in your data and NANs will be 0 foldch as expected.400Thanks for explaining the error. Despite addin the pseudocount, it still came up with the error. Would this be because the 0's in the dataset would be in there as Na instead of 0? If so how would you change those Na's to 0's?

50Can you provide some examples of the mean values for which you are getting this error?

400I disagree. Don't take logs after taking the mean, Take means after taking the logs, or take logs after taking the geometric means

5.0kWhile seidel has addressed your actual question, I just wanted to take a step back and ask: have you considered using any of the established R packages for differential gene expression analysis, such as DESeq2, edgeR, limma? These packages will calculate the log-fold-changes for you (they'll even do it in a somewhat more sophisticated manner taking the quirks of RNA-seq data into account) and they will even offer you (adjusted) p-values to let you make an educated guess as to which genes might be worth looking at.

5.2kYeah I have used limma later on. This exercise was for me to try an get a deeper understanding of what is actually going on behind the scenes when using those packages.

50That's great! If you want to know about the nitty-gritty details, the introduction of the limma-voom paper mentions a lot of the statistical details that are applied in those packages (which is the reason why simply taking means and log-fold changes won't give you the same results)

5.2kThat is the head of avg.t2d With the pseudocount:

50