Question: How to compare two groups of 3 samples
0
gravatar for moxu
2.7 years ago by
moxu440
moxu440 wrote:

I have 2 experimental conditions, each generate a series of values for some genes, like the following:

   s1  s2  s3
g1 n11 n12 n13
g2 n21 n22 n23 . . .

s1, s2, s3 are samples; g1, g2, ... are gene names; n11, n12, ... are the corresponding gene expression levels, nij being the expression of gene i in sample j. s1 & s2 belong to one group (treatment), and s3 is the other group (control) by itself.

My biological question: how to find out whether a gene is statistically differentially expressed? Or statistically, what test should I use to find whether ni3 is sinificantly different from ni1 & ni2?

Thank you!

R gene • 867 views
ADD COMMENTlink modified 2.7 years ago by shunyip180 • written 2.7 years ago by moxu440

A quick heat plot should give you a idea of how the expression looks. Please ensure they are normalized.

ADD REPLYlink written 2.7 years ago by sridhar56100

assuming normally distributed.

ADD REPLYlink written 2.7 years ago by moxu440
0
gravatar for shunyip
2.7 years ago by
shunyip180
shunyip180 wrote:

You can look at the manuals of several Bioconductor tools, such as limma, edgeR and DESeq2.

ADD COMMENTlink written 2.7 years ago by shunyip180

I use edgeR, not sure if edgeR is suited for comparing a one-sample group with another group. Just statistically, what would be the way to go? Pooled t-test requires variance of each group, but a group with one sample does not have a variance. One way I can think of is to compute mean i = mean(ni1, ni2, ni3), var i = var(ni1, ni2, ni3), and do a t-test using ti = (ni3 - mean i) / sqrt(var i / 3). Another similar test could be ti = (ni3 - (ni1 + ni2)/2) / sqrt(var(ni1, ni2) / 2). Not sure if any of these two methods is appropriate.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by moxu440

If you do not have replicates, you will have to assume that the sample's expression values are all accurate.

Instead of performing a t test gene by gene, I would suggest calculating the fold change of all genes. Then, identify genes with significantly high log2 fold changes as DEG. This way, you can "borrow" information across genes to compensate for things like batch effects. I believe it should be safe to assume that the log2 fold changes are normally distributed, but you might need to make sure.

ADD REPLYlink written 2.7 years ago by shunyip180

log2FC looks bimodal, split around 0.

ADD REPLYlink written 2.7 years ago by moxu440

You might need to normalize your expression data then. Are you using CPM or TPM?

ADD REPLYlink written 2.7 years ago by shunyip180

The library sizes are almost identical -- the ratio is like 1.006xx.

Since you asked about CPM or TPM, I have to admit that I lied -- it's not gene expression data but ChIP-seq signal. But I don't think it matters, right?

ADD REPLYlink written 2.7 years ago by moxu440

It shouldn't matter.

Um.. did you filter all signals where one of the samples is zero or has very low read count?

From personal experience, when I see bimodal in this situation, one of the peaks could be caused by low count genes. Usually, after I filter it, it will become normal.

ADD REPLYlink written 2.7 years ago by shunyip180

I use edgeR, not sure if edgeR is suited for comparing a one-sample group with another group. Just statistically, what would be the way to go?

Statistically, the best way to go would be not having designs with only one replicate. There will be software that can calculate differential expression for this very limited set of samples, but the results will be very unreliable and would need replication in a bigger independent cohort to ensure generalization of the results is possible.

ADD REPLYlink written 2.7 years ago by WouterDeCoster42k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 947 users visited in the last hour