Question: PCA on RNAseq time series data
1
gravatar for jennifer.m.conrad.gr
18 months ago by
jennifer.m.conrad.gr10 wrote:

Hi All, I am investigating my RNAseq data for batch effects and then attempting to remove them. The data file I am trying to use contains raw counts of gene expression values every two days for 2 days (~55,000 rows and 24 columns each representing a different time-point). The RNAseq experiment was done by pooling 3 timepoints per sample with a total of 8 samples. I'm pretty sure the first thing I have to do is a PCA test to look for batch effects and then use something like Limma to remove batch effects. I have looked at tutorials for PCA analysis but am completely lost since I have very limited experience in R. I am taking a R course in a month, but I wanted to try to do this before then. I'm wondering if anyone knows of a simple way I can go about doing this and get around my lack of R knowledge (a shiny app or the like), it would be greatly appreciated. Thanks in advance.

rna-seq pca gene • 840 views
ADD COMMENTlink modified 18 months ago by swbarnes29.3k • written 18 months ago by jennifer.m.conrad.gr10

Read http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html for inspiration. There is example code for PCA from gene expression data.

ADD REPLYlink written 18 months ago by ATpoint44k
0
gravatar for alserg
18 months ago by
alserg630
alserg630 wrote:

You can do a PCA plot in our tool Phantasus: https://genome.ifmo.ru/phantasus/ or https://artyomovlab.wustl.edu/phantasus/

Open your dataset (xlsx or txt file) with "Open your own file menu/My computer". Do Tool/Ajust, check "One plus log2" and "quantile normalization". Then Tools/Plots/PCA plot

ADD COMMENTlink written 18 months ago by alserg630
0
gravatar for adam.faranda
18 months ago by
adam.faranda80
adam.faranda80 wrote:

In R, I typically use the 'prcomp' function to evaluate principle components in "sample by gene" expression matrices. The object returned by this function can be passed directly to plotting functions. I would recommend the "autoplot" function from the package "ggfortify". Depending on whether you want to plot genes or samples, you may need to transpose the matrix of expression values.

    pca<-prcomp(df, scale=T)
    p<-autoplot(pca, data = ft, colour=groupCol)
    p

In this case "df" is my matrix of expression values, and "ft" is sample metadata that I used to color the points.

ADD COMMENTlink modified 18 months ago • written 18 months ago by adam.faranda80

Don't you want to transpose your df given it is a standard expression matrix with samples as columns and rows as genes?

ADD REPLYlink modified 18 months ago • written 18 months ago by ATpoint44k

Yes, sorry. The code I posted was a snippet from a larger function -- in order to plot samples, the matrix should be transposed such that rows are samples and columns are genes.

    pca<-prcomp( t(df), scale=T)
    p<-autoplot(pca, data = ft, colour=groupCol)
    p
ADD REPLYlink modified 18 months ago • written 18 months ago by adam.faranda80
0
gravatar for swbarnes2
18 months ago by
swbarnes29.3k
United States
swbarnes29.3k wrote:

I'm pretty sure the first thing I have to do is a PCA test to look for batch effects and then use something like Limma to remove batch effects.

Note that if you are using software like DESeq or EdgeR to find differentially expressed genes, you do not remove batch effect, you just include batch as an element in the linear model design.

ADD COMMENTlink written 18 months ago by swbarnes29.3k

Hi @swbarnes2 -- I'm working on a similar type of project as the OP; but I still have a lot to learn about mitigating batch effects. If you include the batch covariate as a factor in the model design matrix, is that sufficient? I'm also wondering how you might apply to cluster analysis. Is there a way to obtain "batch corrected" cpm values from edgeR that could then be clustered?

ADD REPLYlink modified 18 months ago • written 18 months ago by adam.faranda80

If you plan on using DESeq, read down to Michael Love's post

https://support.bioconductor.org/p/121408/

You can't get a better answer than that.

ADD REPLYlink written 18 months ago by swbarnes29.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1377 users visited in the last hour