I am slightly confused about the input requirements for the differential expression tool - last step of Trinity
Entering edit mode
3.6 years ago
Dennis • 0

NOTE: I was advised to repost this thread here by a moderator of another Biostars forum - I am not intending to waste anyone's time, simply looking for more eyes for this question - it has not been answered to date

Hello all,

I have a pressing question...

To start with, I have read the Trinity methods paper and Googled TMM normalization and different unit format prior to asking this question here:

I have a paired ends RNAseq data set with 3 conditions and 3 replicates.

I have assembled a transcriptome de novo using Trinity by utilizing one sample from each group (if I try to use all 9 -> 19 files, the DRM shuts down my job). Following that I ran RSEM on each sample and built a matrix table of expected counts. The end goal of the project is to quantify differentially expressed genes in two treatment groups in relation to control

I then ran EdgeR on the counts table and it gave me pairwise comparison between all samples. I don't see the relevance of Ct1 being different from Ct3, but maybe it becomes useful in the following steps of the analysis or something.

After this step, the last tool on the RNAseq Trinity protocol is Analyze_Differential_Expression. It asks to input (i) EdgeR tar.gz file (got that!), and (ii) TMM normalized FPKM matrix.

It is the second item that I'm confused about:

1) I know there is a way to TMM normalize in R or if you install a local Trinity on your machine, is there a way to TMM-normalize while avoiding both the installation on your computer (somewhere on the Galaxy main instance, or on the Trinity instance) and R usage? I'm not lazy, but I don't really code and I'm not that friendly with R, so in my mind using Galaxy I hoping to avoid both. So can I do it in Excel or something and then create a tabular file I can use as a matrix in Analyze_Differential_Expression?

2) RSEM file for every sample spits out expected counts (which are used in EdgeR), as well as TPM and FPKM - I know one shouldn't use FPKM for differential expression analysis (I've read that much abundantly so far), but can I use TPM values instead of TMM-normalized FPKM?

The confusion on this point stems from the tutorial posted here: https://github.com/trinityrnaseq/GalaxyTrinityProtocol/wiki

There it simply states input "abundance estimation to matrix_DS_HS_log_Plat: Counts_matrix" as input for normalized FPKM counts - does this mean I can just use raw counts like I used in EdgeR (i.e., the same file) to work with here? In the tutorial it seems like the same file labelled 400 is being used.

Thank you for all the help and answers in advance!


RNA-Seq • 1.7k views

Login before adding your answer.

Traffic: 2038 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6