Question: PCA on RNAseq time series data
1
gravatar for jennifer.m.conrad.gr
15 months ago by
jennifer.m.conrad.gr10 wrote:

Hi All, I am investigating my RNAseq data for batch effects and then attempting to remove them. The data file I am trying to use contains raw counts of gene expression values every two days for 2 days (~55,000 rows and 24 columns each representing a different time-point). The RNAseq experiment was done by pooling 3 timepoints per sample with a total of 8 samples. I'm pretty sure the first thing I have to do is a PCA test to look for batch effects and then use something like Limma to remove batch effects. I have looked at tutorials for PCA analysis but am completely lost since I have very limited experience in R. I am taking a R course in a month, but I wanted to try to do this before then. I'm wondering if anyone knows of a simple way I can go about doing this and get around my lack of R knowledge (a shiny app or the like), it would be greatly appreciated. Thanks in advance.

rna-seq pca gene • 691 views
ADD COMMENTlink modified 15 months ago by swbarnes28.9k • written 15 months ago by jennifer.m.conrad.gr10

Read http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html for inspiration. There is example code for PCA from gene expression data.

ADD REPLYlink written 15 months ago by ATpoint40k
0
gravatar for alserg
15 months ago by
alserg560
alserg560 wrote:

You can do a PCA plot in our tool Phantasus: https://genome.ifmo.ru/phantasus/ or https://artyomovlab.wustl.edu/phantasus/

Open your dataset (xlsx or txt file) with "Open your own file menu/My computer". Do Tool/Ajust, check "One plus log2" and "quantile normalization". Then Tools/Plots/PCA plot

ADD COMMENTlink written 15 months ago by alserg560
0
gravatar for adam.faranda
15 months ago by
adam.faranda80
adam.faranda80 wrote:

In R, I typically use the 'prcomp' function to evaluate principle components in "sample by gene" expression matrices. The object returned by this function can be passed directly to plotting functions. I would recommend the "autoplot" function from the package "ggfortify". Depending on whether you want to plot genes or samples, you may need to transpose the matrix of expression values.

    pca<-prcomp(df, scale=T)
    p<-autoplot(pca, data = ft, colour=groupCol)
    p

In this case "df" is my matrix of expression values, and "ft" is sample metadata that I used to color the points.

ADD COMMENTlink modified 15 months ago • written 15 months ago by adam.faranda80

Don't you want to transpose your df given it is a standard expression matrix with samples as columns and rows as genes?

ADD REPLYlink modified 15 months ago • written 15 months ago by ATpoint40k

Yes, sorry. The code I posted was a snippet from a larger function -- in order to plot samples, the matrix should be transposed such that rows are samples and columns are genes.

    pca<-prcomp( t(df), scale=T)
    p<-autoplot(pca, data = ft, colour=groupCol)
    p
ADD REPLYlink modified 15 months ago • written 15 months ago by adam.faranda80
0
gravatar for swbarnes2
15 months ago by
swbarnes28.9k
United States
swbarnes28.9k wrote:

I'm pretty sure the first thing I have to do is a PCA test to look for batch effects and then use something like Limma to remove batch effects.

Note that if you are using software like DESeq or EdgeR to find differentially expressed genes, you do not remove batch effect, you just include batch as an element in the linear model design.

ADD COMMENTlink written 15 months ago by swbarnes28.9k

Hi @swbarnes2 -- I'm working on a similar type of project as the OP; but I still have a lot to learn about mitigating batch effects. If you include the batch covariate as a factor in the model design matrix, is that sufficient? I'm also wondering how you might apply to cluster analysis. Is there a way to obtain "batch corrected" cpm values from edgeR that could then be clustered?

ADD REPLYlink modified 15 months ago • written 15 months ago by adam.faranda80

If you plan on using DESeq, read down to Michael Love's post

https://support.bioconductor.org/p/121408/

You can't get a better answer than that.

ADD REPLYlink written 15 months ago by swbarnes28.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1492 users visited in the last hour