I want to use DESeq2 to normalize the raw count data to do PCA.
I dont have colData.
What code should I use? because in the DESeq2 workflow we need colData and design.
Why do you want to do a PCA in the first place? If all your samples have the exact same features, why compare them at all? colData is simply there to keep track of metadata for each sample, e.g. the day they were prepared, the donor they came from, the condition the cells were treated etc.
I want PCA analysis to see how gene expression differes between samples and how my samples group together. also i want to remove outliers so I do PCA to find out outliers.
How do your samples differ? What kinds of groups are you expecting? These are precisely the type of information that should go into colData (that being said, you don't need colData for the PCA, but it'll help with the visualization)
I want to categorize samples after PCA and after removing outliers. i have a column with a continuous variable in my clinical data. after PCA and removing outliers i will use this column to categorize into two groups. but i dont want this affect my samples before unsupervised analysis.
Just because the information is stored in colData doesn't mean it's going to be used for specific sections of the analysis.
The plotPCA function of DESeq2 is a convenient wrapper function around the base R function for performing PCA. Here are the relevant bits from the source code:
# calculate the variance for each gene
rv <- rowVars(assay(object))
# select the ntop genes by variance
select <- order(rv, decreasing=TRUE)[seq_len(min(ntop, length(rv)))]
# perform a PCA on the data in assay(x) for the selected genes
pca <- prcomp(t(assay(object)[select,])) ## as you can see, there's no calling of colData() or design, only of assay()
# the contribution to the total variance for each component
percentVar <- pca$sdev^2 / sum( pca$sdev^2 )
You can read more about these issues in this class material, I recommend to have a look at section 5.3.3.
Also, I just have clinical data for some samples not all. is there any method for normalizing with DESeq2?
What do you envision? Do you want DESeq to infer the missing data? That's not going to happen, no.
no not missing values. i just want to normalize and remove outliers before PCA
Yes, there are. Please read the chapter I mentioned in the linked class material.
You can Google the size normalization algorithm and do it yourself.
Or, make fake coldata, and a design of 1, and then you can make a valid DESeq2 object.
This is my confusion, why should I use clinical data as I wont use it?
I have clinical data, but i dont want to group my samples/patients in the gene count data. What is the usage of clinical data file here?
It's not used for these particular steps, but instead for the rest of the analysis.
I want to create fake data and follow this method. So, it will not mess my result?
It won't mess up the PCA, just don't use the fake coldata for performing differential expression.
Thanks @Devon Ryan
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy