Question: How to calculate a % difference change in gene expression from a linear regression.
0
2.9 years ago by
Tom40
United Kingdom
Tom40 wrote:

I have a table like this:

``````        Age5    Age5   Age5   Age22   Age22   Age22
Gene1   1.2     2.3    4.5    3.4     4.5     1.3
Gene2   2.4    -2.3    1.3    1.2     3.4     4.5
``````

i.e two age groups (5 and 22), for multiple genes and the values are log2 transformed gene expression data.

for one gene, for example, I did a linear regression (so the x axis is age, and the y axis is log2 expression values). The statistics from the output from that regression are:

``````equation type: linear
co-efficient = -0.127
intercept = 4.85
data transformation = log2
% change between age group 5 and 22 = 51%
``````

The problem, I do not understand how they calculated the % change as 51% using the information.

For example, I said: y = b0 + b1(x)

For expression data at age 5;

``````y = 4.85 + (5)(-0.127)
y = 4.85-0.635
y = 4.215
``````

but since the y(gene expression) is log2 transformed, I log2 transformed 4.215; so the expression data at age 5 (i.e. y) is 2.075.

Then I did the same for age 22:

``````y = 4.85 + (22)(-0.127)
y = 4.85 -3.74
y = 1.11
``````

and similarly, since y is log2 transformed, I log2 transformed 1.11, so the expression at age 22 (i.e. y) = 0.151.

Then, I cannot seem to combine the two expression values (i.e. 2.075 and 0.151) in a way that will give me a 51% change in gene expression between the two age groups as calculated from a linear regression. Can someone show me how this calculation is done?

In case anyone is interested, this is where I got all the above numbers used in my calculations from.

written 2.9 years ago by Tom40

Sorry but I cannot follow and understand your question exaclty... Did you do linear regression yourself, if yes please show your code and data. If not, what do you exactly want to know? How to transform log2 data back to non-transformed data?

You can undo log2 transformation as follow:

``````2^x
``````

Thank you for taking the time to reply. I downloaded a table from a database, digital ageing atlas. So one example of a gene is here.I can see that for the gene in the example link; they did a linear regression for 5 and 22 month old mice, and found the slope/co-efficient of the linear regression to be -0.127, the intercept to be 4.85, the expression data was log2 transformed and they found a 51% decrease in expression for this gene, between the ages of 5 and 22 months (all information in link). I do not understand how they calculated 51%. I checked using R that I get the same values (i.e. slope, co-efficient) when I do a linear regression myself (let me know if posting the code would make a difference?)

Can you show me a calculation how they used the above numbers (e.g. the slope, intercept, ages, log2) to calculate a 51% decrease in expression?

1

I think it would help if you show your code and data.

So the code to conduct the linear regression (and F test) looks like this:

``````uarray <- read.table("' + each_file + '",header=TRUE,row.names=1)
uarray <- t(uarray)
age <- uarray[,1]

# Regression coefficients
c <- apply(uarray,2,function(z)lm(z~age)\$coefficients)
c = t(c)

# F-test to obtain p-values
fstat <- apply(uarray,2,function(z) summary(lm(z~age))\$fstatistic)
fstat = t(fstat)
pval <- apply(fstat,2,function(x,y,z) pf(x,y,z, lower.tail=FALSE),y=fstat[1,2],z=fstat[1,3])
pval = pval[,1:1]

# To create new array with results
fsave <- cbind(c,pval)
``````

The table is too long to post, as it is a gene expression matrix, the columns are log2 transformed gene expression data, the rows are 19,000 genes, and the column names are the age categories e.g. "Age5 Age 5 Age5 Age22 Age22 Age22"? But regardless, in the link I have shown, is it possible to calculate 51% using only the information provided in that link (i.e. intercept, slope etc?)

1

It would help to give the values for your gene of interest, but I think the 51% is the difference between the means of the groups. Like a fold change.

Thank you for the reply. So I understand that the 51% is the difference between the groups yes.

My question is, in my example is it possible, knowing ONLY the elements of the linear regression, i.e. the slope(-0.127), intercept(4.85), age range (i.e. 5-22 months), the data transform for the gene expression (log2), to calculate a 51% value, WITHOUT needing the raw data. and if this is possible, what is the calculation to obtain this?

1

No, you need means of both groups.

Ok great, thanks I appreciate that!

So I was told that I could calculate 51% using only the above digits. So then, instead, for this(the exact same as previous example, except 10% increase instead of 51%, I've just changed it because I can quickly find the gene expression data for this)

I have a gene expression file like this (line 1 is sample names, line 2 is sample ages, and line 3 is log2 gene expression for this particular gene):

``````Gene    SM1     SM2     SM3  SM4    SM5     SM6  SM7    SM8     SM9   SM10    SM11     SM12     SM13 SM14   SM15    SM16    SM17    SM18    SM19    SM20    SM21    SM22    SM23    SM24    SM25    SM26    SM27    SM28    SM29    SM30
Gene    26  26  27  29  30  36  37 38   40  42  45  48  52  53  56  61  66  70  71  73  77  80  81  85  87  90  90  91  95  106
223 8.91    9.10    8.23    8.47    8.46    8.95    8.73    8.86    8.50    8.02    9.02    8.29    9.46    8.93    9.28    8.54    9.90    9.16    9.08    9.24    9.44    9.21    9.56    9.46    9.08    9.35    8.94    9.17    9.09    9.75
``````

I put this line into the above code, and I get the exact slope, intercept, p values etc as described in the hyperlink in this comment. Can you tell me how I could change the R code I've given above, to extract the percentage change with age? Note that in this particular case, it is not the mean between two groups, but rather a change in age overall (and I know that the answer should be a 10% increase in gene expression for this gene with age).

1

Seems to me that the website you are referring to has poor description of methods. If you don't know how they come to a percentage or value, you also don't know if you can use it or not. So my advice would be to analyze it yourself, so that you know what you are doing. Good luck!