Question

normalization for unsupervised analysis by DESeq2

1

Entering edit mode

2.2 years ago

Rob ▴ 170

Hi friends

I want to use DESeq2 to normalize the raw count data to do PCA. I dont have colData.

What code should I use? because in the DESeq2 workflow we need colData and design. thanks

RNA-Seq • 2.7k views

ADD COMMENT • link 2.2 years ago by Rob ▴ 170

0

Entering edit mode

Why do you want to do a PCA in the first place? If all your samples have the exact same features, why compare them at all? colData is simply there to keep track of metadata for each sample, e.g. the day they were prepared, the donor they came from, the condition the cells were treated etc.

ADD REPLY • link 2.2 years ago by Friederike 8.9k

0

Entering edit mode

I want PCA analysis to see how gene expression differes between samples and how my samples group together. also i want to remove outliers so I do PCA to find out outliers.

ADD REPLY • link 2.2 years ago by Rob ▴ 170

0

Entering edit mode

How do your samples differ? What kinds of groups are you expecting? These are precisely the type of information that should go into colData (that being said, you don't need colData for the PCA, but it'll help with the visualization)

ADD REPLY • link 2.2 years ago by Friederike 8.9k

0

Entering edit mode

I want to categorize samples after PCA and after removing outliers. i have a column with a continuous variable in my clinical data. after PCA and removing outliers i will use this column to categorize into two groups. but i dont want this affect my samples before unsupervised analysis.

ADD REPLY • link 2.2 years ago by Rob ▴ 170

0

Entering edit mode

Just because the information is stored in colData doesn't mean it's going to be used for specific sections of the analysis.

The plotPCA function of DESeq2 is a convenient wrapper function around the base R function for performing PCA. Here are the relevant bits from the source code:

 # calculate the variance for each gene
  rv <- rowVars(assay(object))

  # select the ntop genes by variance
  select <- order(rv, decreasing=TRUE)[seq_len(min(ntop, length(rv)))]

  # perform a PCA on the data in assay(x) for the selected genes
  pca <- prcomp(t(assay(object)[select,])) ## as you can see, there's no calling of colData() or design, only of assay()

  # the contribution to the total variance for each component
  percentVar <- pca$sdev^2 / sum( pca$sdev^2 )

You can read more about these issues in this class material, I recommend to have a look at section 5.3.3.

ADD REPLY • link 2.2 years ago by Friederike 8.9k

0

Entering edit mode

Also, I just have clinical data for some samples not all. is there any method for normalizing with DESeq2?

ADD REPLY • link 2.2 years ago by Rob ▴ 170

0

Entering edit mode

What do you envision? Do you want DESeq to infer the missing data? That's not going to happen, no.

ADD REPLY • link 2.2 years ago by Friederike 8.9k

0

Entering edit mode

no not missing values. i just want to normalize and remove outliers before PCA

ADD REPLY • link 2.2 years ago by Rob ▴ 170

0

Entering edit mode

Yes, there are. Please read the chapter I mentioned in the linked class material.

ADD REPLY • link 2.2 years ago by Friederike 8.9k

score 3 · Accepted Answer · 2022-01-19

3

Entering edit mode

2.2 years ago

swbarnes2 14k

You can Google the size normalization algorithm and do it yourself.

Or, make fake coldata, and a design of 1, and then you can make a valid DESeq2 object.

ADD COMMENT • link 2.2 years ago by swbarnes2 14k

1

Entering edit mode

This is my confusion, why should I use clinical data as I wont use it? I have clinical data, but i dont want to group my samples/patients in the gene count data. What is the usage of clinical data file here?

ADD REPLY • link 2.2 years ago by Rob ▴ 170

2

Entering edit mode

It's not used for these particular steps, but instead for the rest of the analysis.

ADD REPLY • link 2.2 years ago by Devon Ryan 104k

0

Entering edit mode

I want to create fake data and follow this method. So, it will not mess my result?

ADD REPLY • link 2.2 years ago by Rob ▴ 170

2

Entering edit mode

It won't mess up the PCA, just don't use the fake coldata for performing differential expression.