Question

specific RNA-SEQ GLM design

0

Entering edit mode

6.9 years ago

gero007 • 0

Hi,

I received RNA-SEQ data for analyzing without being involved in the experimental design before the sequencing was conducted. I'm using GLM in edgeR for comparing the conditions, but now for this particular setup seems like adapt the models presented in the edgeR guide is not so trivial (at least for my limited statistical knowledge).

They have a cell line which can be transformed (outcome) with a single point mutation (Mutation). They tested by knocking out(KO) two different genes if they can avoid this transformation. The results showed that both knockouts could avoid the transformation. However from the preliminary data analysis I can tell that the effect on the expression profiles for the mutation is largely bigger than for both of the knockouts.

The data available can be represented like this:

sample	Mutation	KO	Outcome
1	wt	wt	Non_Transformed
2	wt	wt	Non_Transformed
3	wt	wt	Non_Transformed

4	G12V	wt	Transformed
5	G12V	wt	Transformed
6	G12V	wt	Transformed

7	G12V	genX	Non_Transformed
8	G12V	genX	Non_Transformed
9	G12V	genX	Non_Transformed

10	G12V	genY	Non_Transformed
11	G12V	genY	Non_Transformed
12	G12V	genY	Non_Transformed

Being said that the effect of the mutation is overwhelming in comparison with the knockouts, the expression profiles of the knockouts are extremely close to the control (wt for Mutation + wt for knockouts). The idea here is to understand this slight difference that can avoid the transformed outcome

For trying to modelate this I coded

 >Mutation <- as.factor(c("NoMut","NoMut","NoMut","G12V","G12V","G12V","G12V","G12V","G12V","G12V","G12V","G12V"))

>KO <- as.factor(c("ctrl","ctrl","ctrl","ctrl","ctrl","ctrl","genx","genx","genx","geny","geny","geny"))

 >design <- model.matrix(~KO+Mutation, dgeCountsClean$samples)

wich renders the design

>design
            (Intercept) KOgenyKO KOgenxKO MutationG12V
wt_1                 1         0        0            0
wt_2                 1         0        0            0
wt_3                 1         0        0            0
G12V_1               1         0        0            1
G12V_2               1         0        0            1
genxG12V_1           1         0        1            1
genxG12V_2           1         0        1            1
genxG12V_3           1         0        1            1
genyG12V_1           1         1        0            1
genyG12V_2           1         1        0            1
genyG12V_3           1         1        0            1

I checked the the KOgenyKO and KOgenxKO, for deferentially expressed genes but unfortunately this design for these samples doesn't seems to be sensitive enough for accounting differences in the expression profiles. I thought that maybe modelling the GLM as ~KO*Mutation and checking for the coefficients in the interaction between the conditions (KOgenyKO:MutationG12V and KOgenxKO:MutationG12V) could help, but the problem with this particular design is that because I don't have the conditions of the KO without the mutation, the conditions KOgenxKO:MutationG12V and KOgenxKO (and the same for the geny) are redundant and the matrix is not of full rank.

So if anyone could give some piece of advice, a tutorial to read, or any tip for help me get out of this conundrum I will be extremely grateful.

Cheers!

Gero.

RNA-Seq R Design GLM • 1.2k views

ADD COMMENT • link 6.9 years ago by gero007 • 0