Question

Interpretation problem at PCA plot for detecting batch effect

0

Entering edit mode

5.6 years ago

modarzi ▴ 170

Hi, I have 74 cancer cases who their RNA-seq gene expression belongs to 14 batches. My exprData was normalized by FPKM method. After running prcomp() on my expression data, I plotted PC1 Vs PC2. my plot is available in this link. Now, I need to know more about the interpretation of that plot. does my exprData has batch effects? does it need to batch effect correction?

I appreciate it if anybody shares his/her comment with me.

PCA RNA-Seq batch-effect • 1.8k views

ADD COMMENT • link updated 14 months ago by Ram 45k • written 5.6 years ago by modarzi ▴ 170

score 0 · Answer 1 · 2019-12-02

0

Entering edit mode

5.6 years ago

Martombo ★ 3.2k

There are better ways to assess the presence of batch effects. First thing, FPKM is not a robust normalization method to compare different samples. Use the normalization methods of R packages like DESeq2 or limma-voom, it would then be more appropriate to look for a batch effect. You could also remove lowly expressed genes or select most variables ones for an unsupervised analysis like PCA, so as to remove some of the variability of the dataset. Finally, some ad-hoc methods can establish the presence of significant co-variates present in your data. See the sva package, for example.

ADD COMMENT • link 5.6 years ago by Martombo ★ 3.2k

0

Entering edit mode

Thanks. about the first thing I have to say my data belongs to TCGA and I can also download HTseq-Count data. So should I have batch effect analysis on HTseq-Count data or normalize it by other methods in DESeq2 package? If yes, which normalization method is better?

For the second thing, you wrote it is better I remove lowly expressed genes. can I compute gene variance between the samples and remove genes by zero variance? do you recommend better ways?

Finally, I can't understand your mean about ad-hoc methods. could you give an example?

ADD REPLY • link 5.6 years ago by modarzi ▴ 170