Understanding the ColData Matrix
1
2
Entering edit mode
24 days ago

I am a student analyzing differential gene analysis between two stimuli using DESEQ2 and I just want to understand how the colData matrix interacts with the raw counts data. What is the logic behind it and what is the criteria to make an accurate colData matrix?

RNA-SEQ R Micheal_Love DESEQ2 • 303 views
2
Entering edit mode
24 days ago

Row names of the colData should match the column names of the counts matrix, each respective row or column representing a sample. colData itself is just a dataframe, so each column represents a field to be associated with each sample (e.g. age, gender, genotype, treatment_condition, etc). These fields can then be utilized in the design and grouping/annotating samples in visualizations.

0
Entering edit mode

What colData is just one column?

0
Entering edit mode

If you have 5 control samples, and 5 treated samples, that's fine. You just need a column for sample names, and a column for treatment.

0
Entering edit mode

Like so:

> sample_info <- data.frame(condition = c(rep("SNF2",5), rep("WT",5)), row.names = names(readcounts) )
> sample_info
condition
SNF2_1 SNF2
SNF2_2 SNF2
SNF2_3 SNF2
SNF2_4 SNF2
SNF2_5 SNF2
WT_1 WT
WT_2 WT
WT_3 WT
WT_4 WT
WT_5 WT