Question: RNASeq statistical analysis
1
gravatar for agolicz
10 months ago by
agolicz30
Australia
agolicz30 wrote:

I am doing some statistical analysis of RNASeq data and I have run into a bit of a problem. Instead of looking at comparison of individual genes I want to compare groups of genes. The setup is as follows: I have two independent groups of genes. For each gene I have three measures from three biological replicates. I want to compare expression between groups. Sample table with design: https://ibb.co/myCssd

Right now I'm averaging biological replicates and performing Mann-Whitney U-test, but I was wondering if there is a better solution? The other problem is that I can not really compare absolute expression values like FPKM, so I'm looking at % maximum expression instead. This works nicely on averages, but would not really work on individual replicates due to outliers...

Any suggestions would be greatly appreciated.

A bit of background: I have 5 time points, 3 replicates each. I know that between time point 2 and 3 there is a transcription factor which is expressed. I want to see the effect it has on the target genes, compared to the control genes at time point 3. I essentially want to see if there is a difference in expression between target (bound by transcription factor) and control (not bound by the transcription factor) genes at time point 3. At this point I calculate the mean of three replicates for all the genes. Split the genes into target and control sets and then use Mann-Whitney U test to see if there is a difference in underlying distributions. Also, because the absolute values are difficult to compare I scale them to % of maximum expression. So, I’m essentially comparing the difference in distributions of % of maximum expression at point 3. One of my problems is that instead of incorporating the replicates I’m just using the mean. I was wondering if there is a cleaner way performing analysis like that?

rna-seq • 483 views
ADD COMMENTlink modified 10 months ago • written 10 months ago by agolicz30

As I understand, you want to compare the combined effect of more than one gene at the same time? So, it would be something like:

Gene1+Gene2+Gene3 in GroupA Versus Gene1+Gene2+Gene3 in GroupB ?

...or are you literally just looking for a singl p-value for each gene?

If you could at all possible obtain raw counts, that would be beneficial.

ADD REPLYlink modified 10 months ago • written 10 months ago by Kevin Blighe41k

Yes. Option one. But the groups are independent so it's group1: gene1+gene2+gene3 vs group2: gene4+gene5+gene6. I essentially want to see if the expression in group2 is lower/higher than group1 at a given point. I have added some background information. Hopefully that could help?

ADD REPLYlink written 10 months ago by agolicz30

Hello, thanks for adding the background - seems very interesting. The experiment would obviously be better performed in a 'robust' DE analysis tool, like EdgeR, DESeq2, etc. However, I understand that you may not have the raw counts such that you could utiliise these tools.

I'm not sure if it's exactly what you could use but I recently posted about the application of the Wald Test to a fitted regression model: A: Differential Expression analysis using Wald-Test

In such a case, you'd do:

model <- glm(TFbinding ~ gene1 + gene2 + gene3 + ... + geneX)

wald.test(b=coef(model), Sigma=vcov(model), Terms=2:4)

This would derive a single p-value for gene1 + gene2 + gene3 (cofficients / terms 2 to 4) in relation to their ability to distinguish TF binding as encoded in the TFbinding categorical variable. If TFbinding is a continous representation of TF binding, then you should switch the model to a linear regression lm()

Not sure if that helps at all.

ADD REPLYlink written 10 months ago by Kevin Blighe41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 680 users visited in the last hour