Question

PCA on RNAseq time series data

1

Entering edit mode

4.7 years ago

jennifer.m.conrad.gr ▴ 10

Hi All, I am investigating my RNAseq data for batch effects and then attempting to remove them. The data file I am trying to use contains raw counts of gene expression values every two days for 2 days (~55,000 rows and 24 columns each representing a different time-point). The RNAseq experiment was done by pooling 3 timepoints per sample with a total of 8 samples. I'm pretty sure the first thing I have to do is a PCA test to look for batch effects and then use something like Limma to remove batch effects. I have looked at tutorials for PCA analysis but am completely lost since I have very limited experience in R. I am taking a R course in a month, but I wanted to try to do this before then. I'm wondering if anyone knows of a simple way I can go about doing this and get around my lack of R knowledge (a shiny app or the like), it would be greatly appreciated. Thanks in advance.

RNA-Seq gene PCA • 2.3k views

ADD COMMENT • link updated 4.7 years ago by swbarnes2 14k • written 4.7 years ago by jennifer.m.conrad.gr ▴ 10

0

Entering edit mode

Read http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html for inspiration. There is example code for PCA from gene expression data.

ADD REPLY • link 4.7 years ago by ATpoint 81k

score 0 · Answer 1 · 2019-07-25

0

Entering edit mode

4.7 years ago

alserg ▴ 920

You can do a PCA plot in our tool Phantasus: https://genome.ifmo.ru/phantasus/ or https://artyomovlab.wustl.edu/phantasus/

Open your dataset (xlsx or txt file) with "Open your own file menu/My computer". Do Tool/Ajust, check "One plus log2" and "quantile normalization". Then Tools/Plots/PCA plot

ADD COMMENT • link 4.7 years ago by alserg ▴ 920

score 0 · Answer 2 · 2019-07-25

0

Entering edit mode

4.7 years ago

adam.faranda ▴ 110

In R, I typically use the 'prcomp' function to evaluate principle components in "sample by gene" expression matrices. The object returned by this function can be passed directly to plotting functions. I would recommend the "autoplot" function from the package "ggfortify". Depending on whether you want to plot genes or samples, you may need to transpose the matrix of expression values.

    pca<-prcomp(df, scale=T)
    p<-autoplot(pca, data = ft, colour=groupCol)
    p

In this case "df" is my matrix of expression values, and "ft" is sample metadata that I used to color the points.

ADD COMMENT • link 4.7 years ago by adam.faranda ▴ 110

0

Entering edit mode

Don't you want to transpose your df given it is a standard expression matrix with samples as columns and rows as genes?

ADD REPLY • link 4.7 years ago by ATpoint 81k

0

Entering edit mode

Yes, sorry. The code I posted was a snippet from a larger function -- in order to plot samples, the matrix should be transposed such that rows are samples and columns are genes.

    pca<-prcomp( t(df), scale=T)
    p<-autoplot(pca, data = ft, colour=groupCol)
    p

ADD REPLY • link 4.7 years ago by adam.faranda ▴ 110

score 0 · Answer 3 · 2019-07-25

0

Entering edit mode

4.7 years ago

swbarnes2 14k

I'm pretty sure the first thing I have to do is a PCA test to look for batch effects and then use something like Limma to remove batch effects.

Note that if you are using software like DESeq or EdgeR to find differentially expressed genes, you do not remove batch effect, you just include batch as an element in the linear model design.

ADD COMMENT • link 4.7 years ago by swbarnes2 14k

0

Entering edit mode

Hi @swbarnes2 -- I'm working on a similar type of project as the OP; but I still have a lot to learn about mitigating batch effects. If you include the batch covariate as a factor in the model design matrix, is that sufficient? I'm also wondering how you might apply to cluster analysis. Is there a way to obtain "batch corrected" cpm values from edgeR that could then be clustered?

ADD REPLY • link 4.7 years ago by adam.faranda ▴ 110

0

Entering edit mode

If you plan on using DESeq, read down to Michael Love's post

https://support.bioconductor.org/p/121408/

You can't get a better answer than that.

ADD REPLY • link 4.7 years ago by swbarnes2 14k