design matrix dseq2
1
0
Entering edit mode
4.2 years ago
riya • 0

Hi, I am pretty new to the RNA seq data analysis and I need some advice on the design matrix formulation:

This is my sample file with a column group id which I made to group all the variables in one to distinguish one sample from another and then use it in the design matrix:

    Sample  Cell Type  Concentration  time  GroupID
 1   AB1     Control    0              5     AB1_Control_0_5
 2   AB1     Control    0              5     AB1_Control_0_5
 3   AB1     Control    0              5     AB1_Control_0_5
 4   AB1     Treatment  5              5     AB1_Treatment_5_5
 5   AB1     Treatment  5              5     AB1_Treatment_5_5
 6   AB1     Treatment  5              5     AB1_Treatment_5_5
 7   ST1     Control    0              5     ST1_Control_0_5
 8   ST1     Control    0              5     ST1_Control_0_5
 9   ST1     Control    0              5     ST1_Control_0_5
 10  ST1     Treatment  5              5     ST1_Treatment_5_5
 11  ST1     Treatment  5              5     ST1_Treatment_5_5
 12  ST1     Treatment  5              5     ST1_Treatment_5_5
 13  AB1     Control    0              8     AB1_Control_0_8
 14  AB1     Control    0              8     AB1_Control_0_8
 15  AB1     Control    0              8     AB1_Control_0_8
 16  AB1     Treatment  8              8     AB1_Treatment_8_8
 17  AB1     Treatment  8              8     AB1_Treatment_8_8
 18  AB1     Treatment  8              8     AB1_Treatment_8_8
 19  ST1     Control    0              8     ST1_Control_0_8
 20  ST1     Control    0              8     ST1_Control_0_8
 21  ST1     Control    0              8     ST1_Control_0_8
 22  ST1     Treatment  8              8     ST1_Treatment_8_8
 23  ST1     Treatment  8              8     ST1_Treatment_8_8
 24  ST1     Treatment  8              8     ST1_Treatment_8_8

Contrasts I need

for my contrast matrix I want to compare:

  1. AB1.Treatment at concentration 5 and time point 5 vs AB1.control at concentration 0 and time point 5
  2. AB1.Treatment at concentration 8 and time point 8 vs AB1.control at concentration 0 and time point 8
  3. ST1.Treatment at concentration 5 and time point 5 vs ST1.control at concentration 0 and time point 5
  4. ST1.Treatment at concentration 8 and time point 8 vs ST1.control at concentration 0 and time point 8

design matrix I used:

dds = DESeqDataSetFromMatrix(countData = Count_data,
                             colData = Meta_data,
                             design = ~ GroupID)

results(dds,contrast=c("GroupID","AB1_Treatment_5_5","AB1_Control_0_5")) 
results(dds,contrast=c("GroupID","ST1_Treatment_5_5","ST1_Control_0_5")) 
results(dds,contrast=c("GroupID","AB1_Treatment_8_8","AB1_Control_0_8")) 
results(dds,contrast=c("GroupID","ST1_Treatment_8_8","ST1_Control_0_8"))

and when I run resultnames(dds) I see some contrast I don't need and not the ones I need. For example:

GroupIDAB1_Treatment_5_5vsAB1_Control_0_5
GroupIDST1_Treatment_5_5vsAB1_Control_0_5  ( this is not what I want)

but I want GroupIDST1_Treatment_5_5vsST1_Control_0_5.

Also I get the results from this but I see some only 1 as adj values for all the genes in some contrasts. So, is my design matrix right? Could somebody help me on this?

Please tell me if I am doing wrong somewhere

Thanks in advance!

RNA-Seq • 1.2k views
ADD COMMENT
0
Entering edit mode

Hello riya!

It appears that your post has been cross-posted to another site: https://support.bioconductor.org/p/128057/

This is typically not recommended as it runs the risk of annoying people in both communities.

Thanks for the pointer, swbarnes2

ADD REPLY
0
Entering edit mode
4.2 years ago

You already got answers on the bioconductor support site, why are you asking the same question here?

ADD COMMENT
0
Entering edit mode

because I don't have a statistician who could help me verify my design matrix. So, I am looking for someone who has time to help me on this.

ADD REPLY
0
Entering edit mode

I'd recommend creating 4 design matrices or using only a subset of the samples for each DE analysis instead of using them all at once.

For example, for your first DE query, use just the first 6 samples.

EDIT: See swbarnes2's comment below.

ADD REPLY
1
Entering edit mode

I think in generally, it's preferred not to do this; it's preferred to keep all the samples in the dds object to get better estimate of the gene dispersion. In this particular case, you might consider separating the two types of cells out, if they are very different; keeping them together might make the library normalization wonky, or if one cell type has a much higher dispersion than the other, using the higher estimates on the other one will hurt the p-value unnecessarily.

http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#if-i-have-multiple-groups-should-i-run-all-together-or-split-into-pairs-of-groups

ADD REPLY
0
Entering edit mode

thanks for the tip @swbarnes2.

ADD REPLY

Login before adding your answer.

Traffic: 1935 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6