Question: can anybody help me to check if the code is correct to use limma package to find the significantly expressed genes
0
gravatar for sw.arker
5.4 years ago by
sw.arker60
Germany
sw.arker60 wrote:

I am rookie to using limma package. My data looks like this in excel: first column is the gene name, first raw is sample name, and under each sample name is the gene expression value (4 replicates per sample, I have sample wt and mu). I want to find out the significant expressed genes between mu vs wt.

and I write code like this:

sample=read.csv("sample.csv",header=T,row.names=1)
logsample=log2(sample)
design=model.matrix(~0+c(rep('wt',4),rep('mu',4)))
colnames(design)=c("wt","mu")
cm=makeContrasts(mu-wt,levels=design)
fit=lmFit(logsample,design)
fit2=contrasts.fit(fit,cm)
fit3=eBayes(fit2)
result=topTable(fit3,number=Inf,adjust="BH",sort.by="none")

I am not sure I did it correctly. Please help me to check, many thanks!! I log2 transformed the intensity. The data is already normalized by other software; I am not sure I should use makeContrasts or not.

Thanks!!

R • 2.0k views
ADD COMMENTlink modified 5.4 years ago by dariober10k • written 5.4 years ago by sw.arker60
2

What exactly does sample.csv contain?

BTW, if you don't remove the intercept from the design then you can skip looking at the contrast.

ADD REPLYlink written 5.4 years ago by Devon Ryan92k

hi, Devon, thanks for reply, the sample.csv looks like this:

probe wt1 wt2 wt3 wt4 mu1 mu2 mu3 mu4
gene1 1136.704 4123.447 5138.65 5103.593 3919.762 3094.666 4378.834 4429.387
gene2 1253.719 2591.891 2611.626 2790.135 2205.512 1898.726 2374.038 2539.176
gene3 2227.929 1968.311 2303.308 1817.361 2323.718 1060.57 994.9664 1803.954
gene4 572.5932 1043.077 1479.159 1283.932 820.1443 1088.151 1139.485 1082.607
gene5 76.20247 42.47991 8.524587 24.1916 30.72517 11.20368 8.393809 42.14227
gene6 2672.205 5504.846 7236.014 7006.669 6271.228 5925.029 6590.307 6865.492
gene7 324.3387 276.1173 216.5971 287.3867 224.2262 249.4994 199.8653 301.3705
gene8 3554.573 3388.302 3009.981 3454.752 2837.759 2854.741 2503.966 2837.336
gene9 2102.255 1130.993 1789.667 1696.07 1172.711 1385.945 1913.929 1866.4
gene10 2967.336 4161.204 4074.815 4325.702 3897.27 3047.727 3801.668 3924.415
gene11 143.7498 109.3417 81.79682 106.7067 105.4525 90.31444 127.8393 122.719
gene12 2672.205 5504.846 7236.014 7006.669 6271.228 5925.029 6590.307 6865.492
gene13 572.5932 1043.077 1479.159 1283.932 820.1443 1088.151 1139.485 1082.607
gene14 143.7498 109.3417 81.79682 106.7067 105.4525 90.31444 127.8393 122.719
gene15 143.7498 109.3417 81.79682 106.7067 105.4525 90.31444 127.8393 122.719
gene16 455.3064 318.3769 235.8836 349.7169 301.8518 301.0153 242.6692 338.5451
gene17 2913.168 3702.693 3544.199 3899.656 3689.94 2747.714 3436.402 3536.472
gene18 143.7498 109.3417 81.79682 106.7067 105.4525 90.31444 127.8393 122.719
gene19 1761.665 5428.883 6678.637 6753.888 5176.198 4133.363 5760.567 6108.301
gene20 572.5932 1043.077 1479.159 1283.932 820.1443 1088.151 1139.485 1082.607
ADD REPLYlink written 5.4 years ago by sw.arker60
3
gravatar for dariober
5.4 years ago by
dariober10k
WCIP | Glasgow | UK
dariober10k wrote:

Your code looks good to me to find genes differentially expressed between the 4 WT and 4 mu, which I guess it is what you want.

As Devon Ryan suggested, you can leave the intercept and skip the contrasts. This should be equivalent to your code:

design=model.matrix(~1+c(rep('wt',4),rep('mu',4)))
colnames(design)=c("wt","mu")
fit=lmFit(logsample,design)
fit<- eBayes(fit)
results<- topTable(fit,number=Inf,adjust="BH",sort.by="none", coef= 2)

However, I don't see any harm in explicitly fitting contrasts, actually I find it clearer.

I think in general it's good to look at a couple of genes from the output of topTable() and make sure the model is doing what you expect by comparing to the values in the normalized matrix. E.g.  in your case:

results$logFC[1]

Should be equal to (given rounding errors):

mean(logsample[1, 1:4]) - mean(logsample[1, 5:8])

 

 

ADD COMMENTlink written 5.4 years ago by dariober10k

hi, Dariober

Thanks!! I got the same results using your code without contrast; I will check later if the model provides the real significant differential genes in the matrix as you suggested!! 

cheers

ADD REPLYlink written 5.4 years ago by sw.arker60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1740 users visited in the last hour