Question

Understanding the ColData Matrix

2

Entering edit mode

3.0 years ago

raisamathenge ▴ 30

I am a student analyzing differential gene analysis between two stimuli using DESEQ2 and I just want to understand how the colData matrix interacts with the raw counts data. What is the logic behind it and what is the criteria to make an accurate colData matrix?

RNA-SEQ R Micheal_Love DESEQ2 • 3.2k views

ADD COMMENT • link updated 3.0 years ago by Friederike 8.9k • written 3.0 years ago by raisamathenge ▴ 30

score 2 · Answer 1 · 2021-04-17

2

Entering edit mode

3.0 years ago

jared.andrews07 ★ 16k

Row names of the colData should match the column names of the counts matrix, each respective row or column representing a sample. colData itself is just a dataframe, so each column represents a field to be associated with each sample (e.g. age, gender, genotype, treatment_condition, etc). These fields can then be utilized in the design and grouping/annotating samples in visualizations.

ADD COMMENT • link 3.0 years ago by jared.andrews07 ★ 16k

0

Entering edit mode

What colData is just one column?

ADD REPLY • link 3.0 years ago by raisamathenge ▴ 30

0

Entering edit mode

If you have 5 control samples, and 5 treated samples, that's fine. You just need a column for sample names, and a column for treatment.

ADD REPLY • link 3.0 years ago by swbarnes2 14k

0

Entering edit mode

Like so:

> sample_info <- data.frame(condition = c(rep("SNF2",5), rep("WT",5)), row.names = names(readcounts) )
> sample_info
      condition
SNF2_1 SNF2
SNF2_2 SNF2
SNF2_3 SNF2
SNF2_4 SNF2
SNF2_5 SNF2
WT_1 WT
WT_2 WT
WT_3 WT
WT_4 WT
WT_5 WT

ADD REPLY • link 3.0 years ago by Friederike 8.9k