Question

Help needed with selecting top DE genes, Microarray data analysed with "limma"

0

Entering edit mode

8.2 years ago

venu 7.1k

Hello all,

I've analysed microarray data with limma package and ended up with a list of genes that are deferentially expressed. By default it has ranked DE genes based on B-statistics and from the reference manual (Page 4) I thought it would be a good parameter to rank. However from previous threads and some suggestions adjusted p-value would be a more useful parameter to select significant DE genes. I've observed that in my results adjusted p-values are much higher (0.1 - 0.9, nothing is <0.05). what might be the reason here? Should I repeat the analysis process with any changes? Or is it normal?

And I would be interested to mention that the top DE genes selected with B-statistics are actually correlating with the experiments we are doing in the lab (The list contains the number of genes that we were assuming to have differentially expressed). I am little bit biased with this post on significance values.

EDIT:

Illumina platform
12 completely individual cell lines (based on some experimental results we've grouped them into 2, 6 in each)
Normalization - neqc function from limma.

I would like to provide more information, if required.

Thanks

Differential-Expression limma • 2.7k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.2 years ago by venu 7.1k

Ram · Answer 1 · 2016-02-01

2

Entering edit mode

8.2 years ago

andrew.j.skelton73 6.5k

This is quite a difficult question to answer without more detail of your experimental design. Useful information would be the platform you're using, normalisation method, tissue used, species, number of replicates, comparison of interest, etc. This could come down to a lot of different things, but I'd say power is the most obvious one, if you're seeing potentially high log odds ratios, then that can be indicative of something happening, but without an adjusted p value below a decent threshold (0.05), then it's not statistically significant in that dataset. The only other thing I can think of is possible technical variation, so you could use PCA, or methods from the SVA package to try and identify if there's any overwhelming technical variation present.

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 8.2 years ago by andrew.j.skelton73 6.5k

0

Entering edit mode

Thanks andrew for more interesting points to converge the problem. Question updated.

ADD REPLY • link 8.2 years ago by venu 7.1k

0

Entering edit mode

I don't really work with tumours, so someone more familiar with them may be able to comment on how that should impact relative to sample size. I'd suggest you take a probe targets a gene you're familiar with, and visualise the log2 expression relative to each sample, and see what it looks like, see how variable the samples are, and the means by sample type. Your normalisation choice is fine, you can feel free to update with your code so we can take a quick look through, but other than that, there's not much more I can suggest.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 8.2 years ago by andrew.j.skelton73 6.5k