Hi, I have 74 cancer cases who their RNA-seq gene expression belongs to 14 batches. My exprData was normalized by FPKM method. After running prcomp()
on my expression data, I plotted PC1
Vs PC2
. my plot is available in this link. Now, I need to know more about the interpretation of that plot. does my exprData has batch effects? does it need to batch effect correction?
I appreciate it if anybody shares his/her comment with me.
Thanks. about the first thing I have to say my data belongs to TCGA and I can also download HTseq-Count data. So should I have batch effect analysis on HTseq-Count data or normalize it by other methods in DESeq2 package? If yes, which normalization method is better?
For the second thing, you wrote it is better I remove lowly expressed genes. can I compute gene variance between the samples and remove genes by zero variance? do you recommend better ways?
Finally, I can't understand your mean about ad-hoc methods. could you give an example?