How do I go from TPM to PCA (rna-seq)?
2
1
Entering edit mode
22 months ago
biotrekker ▴ 110

Hello all, I am new to rna-seq analysis. I have abundance TPM information (kallisto output, from 400+ samples). I just want to perform clustering analysis? How do I go from TPM data to PCA and other rna-seq? Is there any further normalization/processing that needs to be done?

Thanks

pca rnaseq tpm • 1.5k views
ADD COMMENT
2
Entering edit mode
22 months ago
msn ▴ 130

welcome to RNAseq!

there are actually a few ways to do this and depending on your biological question and downstream. If you have the raw counts that come out of kallisto as well, I would recommend the DESeq2 package and tutorial as a place to start. Will walk you through creating a PCA plot too.

http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

you can certainly make a PCA from TPMs, you can transform the data (PCA has a hidden assumption of normality, its not a real assumption but its sensitive to outliers) and use it to look for batch effects and to be certain some people do. However there is some debate in the area for using frequency for anything beyond visualization or identifying bs. Remember to go back to your counts if you are going to implement a correction to the data.

My current fav PCA tool is PCATools (https://www.bioconductor.org/packages/release/bioc/html/PCAtools.html) by Kevin Blighe , just clean and pretty.

ADD COMMENT
2
Entering edit mode
22 months ago
ATpoint 86k

Note that kallisto produces transcript abundance information. Most commonly, PCA and differential analysis are done on gene level though. I recommend to run tximport first which will produce gene level counts https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#kallisto and then either follow the linked DESeq2 workflow in the other answer, or follow my tutorial here for something similar in edgeR. PCA is just a few lines if code, I think it is recommended to know how to do that manually without wrapper packages. See the PCA section in: Basic normalization, batch correction and visualization of RNA-seq data

As input use either the logcpms from edgeR or vst/rlog from DESeq2.

As usual, per the manuals of edgeR and DESeq2 for details and best practices.

ADD COMMENT

Login before adding your answer.

Traffic: 2169 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6