Question: design matrix dseq2
0
gravatar for riya
7 months ago by
riya0
riya0 wrote:

Hi, I am pretty new to the RNA seq data analysis and I need some advice on the design matrix formulation:

This is my sample file with a column group id which I made to group all the variables in one to distinguish one sample from another and then use it in the design matrix:

    Sample  Cell Type  Concentration  time  GroupID
 1   AB1     Control    0              5     AB1_Control_0_5
 2   AB1     Control    0              5     AB1_Control_0_5
 3   AB1     Control    0              5     AB1_Control_0_5
 4   AB1     Treatment  5              5     AB1_Treatment_5_5
 5   AB1     Treatment  5              5     AB1_Treatment_5_5
 6   AB1     Treatment  5              5     AB1_Treatment_5_5
 7   ST1     Control    0              5     ST1_Control_0_5
 8   ST1     Control    0              5     ST1_Control_0_5
 9   ST1     Control    0              5     ST1_Control_0_5
 10  ST1     Treatment  5              5     ST1_Treatment_5_5
 11  ST1     Treatment  5              5     ST1_Treatment_5_5
 12  ST1     Treatment  5              5     ST1_Treatment_5_5
 13  AB1     Control    0              8     AB1_Control_0_8
 14  AB1     Control    0              8     AB1_Control_0_8
 15  AB1     Control    0              8     AB1_Control_0_8
 16  AB1     Treatment  8              8     AB1_Treatment_8_8
 17  AB1     Treatment  8              8     AB1_Treatment_8_8
 18  AB1     Treatment  8              8     AB1_Treatment_8_8
 19  ST1     Control    0              8     ST1_Control_0_8
 20  ST1     Control    0              8     ST1_Control_0_8
 21  ST1     Control    0              8     ST1_Control_0_8
 22  ST1     Treatment  8              8     ST1_Treatment_8_8
 23  ST1     Treatment  8              8     ST1_Treatment_8_8
 24  ST1     Treatment  8              8     ST1_Treatment_8_8

Contrasts I need

for my contrast matrix I want to compare:

  1. AB1.Treatment at concentration 5 and time point 5 vs AB1.control at concentration 0 and time point 5
  2. AB1.Treatment at concentration 8 and time point 8 vs AB1.control at concentration 0 and time point 8
  3. ST1.Treatment at concentration 5 and time point 5 vs ST1.control at concentration 0 and time point 5
  4. ST1.Treatment at concentration 8 and time point 8 vs ST1.control at concentration 0 and time point 8

design matrix I used:

dds = DESeqDataSetFromMatrix(countData = Count_data,
                             colData = Meta_data,
                             design = ~ GroupID)

results(dds,contrast=c("GroupID","AB1_Treatment_5_5","AB1_Control_0_5")) 
results(dds,contrast=c("GroupID","ST1_Treatment_5_5","ST1_Control_0_5")) 
results(dds,contrast=c("GroupID","AB1_Treatment_8_8","AB1_Control_0_8")) 
results(dds,contrast=c("GroupID","ST1_Treatment_8_8","ST1_Control_0_8"))

and when I run resultnames(dds) I see some contrast I don't need and not the ones I need. For example:

GroupIDAB1_Treatment_5_5vsAB1_Control_0_5
GroupIDST1_Treatment_5_5vsAB1_Control_0_5  ( this is not what I want)

but I want GroupIDST1_Treatment_5_5vsST1_Control_0_5.

Also I get the results from this but I see some only 1 as adj values for all the genes in some contrasts. So, is my design matrix right? Could somebody help me on this?

Please tell me if I am doing wrong somewhere

Thanks in advance!

rna-seq • 200 views
ADD COMMENTlink modified 7 months ago by RamRS30k • written 7 months ago by riya0

Hello riya!

It appears that your post has been cross-posted to another site: https://support.bioconductor.org/p/128057/

This is typically not recommended as it runs the risk of annoying people in both communities.

Thanks for the pointer, swbarnes2

ADD REPLYlink modified 7 months ago • written 7 months ago by RamRS30k
0
gravatar for swbarnes2
7 months ago by
swbarnes28.6k
United States
swbarnes28.6k wrote:

You already got answers on the bioconductor support site, why are you asking the same question here?

ADD COMMENTlink modified 7 months ago • written 7 months ago by swbarnes28.6k

because I don't have a statistician who could help me verify my design matrix. So, I am looking for someone who has time to help me on this.

ADD REPLYlink written 7 months ago by riya0

I'd recommend creating 4 design matrices or using only a subset of the samples for each DE analysis instead of using them all at once.

For example, for your first DE query, use just the first 6 samples.

EDIT: See swbarnes2's comment below.

ADD REPLYlink modified 7 months ago • written 7 months ago by RamRS30k
1

I think in generally, it's preferred not to do this; it's preferred to keep all the samples in the dds object to get better estimate of the gene dispersion. In this particular case, you might consider separating the two types of cells out, if they are very different; keeping them together might make the library normalization wonky, or if one cell type has a much higher dispersion than the other, using the higher estimates on the other one will hurt the p-value unnecessarily.

http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#if-i-have-multiple-groups-should-i-run-all-together-or-split-into-pairs-of-groups

ADD REPLYlink modified 7 months ago • written 7 months ago by swbarnes28.6k

thanks for the tip @swbarnes2.

ADD REPLYlink written 7 months ago by riya0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2012 users visited in the last hour