Principal Component Analysis PCA
2
0
Entering edit mode
4.6 years ago
Oliver • 0

I am New in bioinformatices my question is In a PCA of 20 RNAseq samples, if PC1 accounts for 80% of the variability and PC2 accounts for 15% of the variability, then PC3 must account for the remaining 5% of the variability. Is that correct

RNA-Seq • 1.6k views
ADD COMMENT
1
Entering edit mode

no it's not. you can (and certainly) have more than 3 PCs in your analysis. Could you put how your generate your PC data please ?

ADD REPLY
0
Entering edit mode

thank you so much now it is clear to me

ADD REPLY
0
Entering edit mode

enter image description here

ADD REPLY
1
Entering edit mode
4.6 years ago
mtrw85 ▴ 10

Nope, there can be as many PCs as there are SNPs, but we typically only calculate a few. In total the PCs will account for 100% of the variability, but if the first few account in total for say 99% of the variability in the data, then there's rarely much point to continuing.

See if your PCA program will produce a scree plot for you. Once you understand those it will make sense :)

ADD COMMENT
1
Entering edit mode

there can be as many PCs as there are SNPs

How did word SNP sneak in there? Unless you are referring to some statistical acronym I am not familiar with.

ADD REPLY
0
Entering edit mode

They must have only ran it on SNPs :) basically, the number of PCs = number of rows.

Rows could be genes (for RNA-seq), SNPs, or anything else.

ADD REPLY
0
Entering edit mode
4.6 years ago
predeus ★ 1.9k

I'd also like to add that if 80% of variance in your 20-sample RNA-seq PCA is explained by PC1, there's either something very wrong with your samples, or with the way you are analyzing your data. Make sure they are log-transformed and normalized - ideally with something like vst or rlog transformation from DESeq2

ADD COMMENT
0
Entering edit mode

If the phenotype is strong like a knockout vs a wild type of a major regulator such as a master transcription factor, I definitely have seen samples with such high PC1 %. Depends of course on the sample type, cell line or primary etc. But yeah I agree that with 20 samples it is at least worth noting, and OP should make sure things are correctly processed. @OP, what kind of data are this, so species, treatment, cell type etc...

ADD REPLY
0
Entering edit mode

Hey, take DESeq2 as example, why use rlog instead of normalized read counts data?

ADD REPLY
1
Entering edit mode

From https://chipster.csc.fi/manual/deseq2-transform.html

Both variance stabilizing transformation (VST) and regularized log transformation (rlog) aim to remove the dependence of the variance on the mean. In particular, genes with low expression level and therefore low read counts tend to have high variance, which is not removed efficiently by the ordinary logarithmic transformation. VST and rlog remove the experiment-wide trend of variance over mean calculated by the DESeq2 method. This dispersion calculation does not take into account the group information, and the transformation is therefore said to be blind.

ADD REPLY

Login before adding your answer.

Traffic: 2802 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6