Hi All, I am investigating my RNAseq data for batch effects and then attempting to remove them. The data file I am trying to use contains raw counts of gene expression values every two days for 2 days (~55,000 rows and 24 columns each representing a different time-point). The RNAseq experiment was done by pooling 3 timepoints per sample with a total of 8 samples. I'm pretty sure the first thing I have to do is a PCA test to look for batch effects and then use something like Limma to remove batch effects. I have looked at tutorials for PCA analysis but am completely lost since I have very limited experience in R. I am taking a R course in a month, but I wanted to try to do this before then. I'm wondering if anyone knows of a simple way I can go about doing this and get around my lack of R knowledge (a shiny app or the like), it would be greatly appreciated. Thanks in advance.
Open your dataset (xlsx or txt file) with "Open your own file menu/My computer". Do Tool/Ajust, check "One plus log2" and "quantile normalization". Then Tools/Plots/PCA plot
In R, I typically use the 'prcomp' function to evaluate principle components in "sample by gene" expression matrices. The object returned by this function can be passed directly to plotting functions. I would recommend the "autoplot" function from the package "ggfortify". Depending on whether you want to plot genes or samples, you may need to transpose the matrix of expression values.
pca<-prcomp(df, scale=T) p<-autoplot(pca, data = ft, colour=groupCol) p
In this case "df" is my matrix of expression values, and "ft" is sample metadata that I used to color the points.
I'm pretty sure the first thing I have to do is a PCA test to look for batch effects and then use something like Limma to remove batch effects.
Note that if you are using software like DESeq or EdgeR to find differentially expressed genes, you do not remove batch effect, you just include batch as an element in the linear model design.