normalization for unsupervised analysis by DESeq2
1
0
Entering edit mode
4 months ago
Rob ▴ 120

Hi friends

I want to use DESeq2 to normalize the raw count data to do PCA. I dont have colData.

What code should I use? because in the DESeq2 workflow we need colData and design. thanks

RNA-Seq • 1.0k views
0
Entering edit mode

Why do you want to do a PCA in the first place? If all your samples have the exact same features, why compare them at all? colData is simply there to keep track of metadata for each sample, e.g. the day they were prepared, the donor they came from, the condition the cells were treated etc.

0
Entering edit mode

I want PCA analysis to see how gene expression differes between samples and how my samples group together. also i want to remove outliers so I do PCA to find out outliers.

0
Entering edit mode

How do your samples differ? What kinds of groups are you expecting? These are precisely the type of information that should go into colData (that being said, you don't need colData for the PCA, but it'll help with the visualization)

0
Entering edit mode

I want to categorize samples after PCA and after removing outliers. i have a column with a continuous variable in my clinical data. after PCA and removing outliers i will use this column to categorize into two groups. but i dont want this affect my samples before unsupervised analysis.

0
Entering edit mode

Just because the information is stored in colData doesn't mean it's going to be used for specific sections of the analysis.

The plotPCA function of DESeq2 is a convenient wrapper function around the base R function for performing PCA. Here are the relevant bits from the source code:

 # calculate the variance for each gene
rv <- rowVars(assay(object))

# select the ntop genes by variance
select <- order(rv, decreasing=TRUE)[seq_len(min(ntop, length(rv)))]

# perform a PCA on the data in assay(x) for the selected genes
pca <- prcomp(t(assay(object)[select,])) ## as you can see, there's no calling of colData() or design, only of assay()

# the contribution to the total variance for each component
percentVar <- pca$sdev^2 / sum( pca$sdev^2 )


You can read more about these issues in this class material, I recommend to have a look at section 5.3.3.

0
Entering edit mode

Also, I just have clinical data for some samples not all. is there any method for normalizing with DESeq2?

0
Entering edit mode

What do you envision? Do you want DESeq to infer the missing data? That's not going to happen, no.

0
Entering edit mode

no not missing values. i just want to normalize and remove outliers before PCA

0
Entering edit mode

3
Entering edit mode
4 months ago

You can Google the size normalization algorithm and do it yourself.

Or, make fake coldata, and a design of 1, and then you can make a valid DESeq2 object.

1
Entering edit mode

This is my confusion, why should I use clinical data as I wont use it? I have clinical data, but i dont want to group my samples/patients in the gene count data. What is the usage of clinical data file here?

2
Entering edit mode

It's not used for these particular steps, but instead for the rest of the analysis.

0
Entering edit mode

I want to create fake data and follow this method. So, it will not mess my result?

2
Entering edit mode

It won't mess up the PCA, just don't use the fake coldata for performing differential expression.

0
Entering edit mode

Thanks @Devon Ryan