Question

Understanding MMDiff output files

2

Entering edit mode

8.3 years ago

Thibault D. ▴ 700

Hi,

I'm currently testing MMSeq/MMdiff pipeline to find differentially expressed transcripts. I have my reduced datasets and ran the protocol just as explained on their GitHub page. Everything seems to work like charm.

Unfortunately, I do not understand the results. The differential expression pipeline is detailed here. The output file is a tabular text file formatted as follows:

feature_id: the name of the feature (e.g. Ensembl transcript ID)
bayes_factor: the Bayes factor in favour of the second model
posterior_probability: the posterior probability in favour of the second model (the prior probability is recorded in a # comment at the top of the file)
alpha0 and alpha1: estimated posterior mean of the global intercept for each model
beta0_0, beta0_1,..., beta1_0, beta1_0: estimated posterior means of the regression coefficients of the model-independent covariate matrix M under each model
eta0_0, eta0_1,..., eta1_0, eta1_1,...: estimated posterior means of the regression coefficients of the model-dependent matrix P under each model
mu_sample1, mu_sample2,..., sd_sample1, sd_sample2,...: the data, i.e. the posterior means and standard deviations used as the outcomes

My questions are: What is the "second model" ? Where are the probabilities of differential expression? Why do I have two "alpha" columns and only one "eta", knowing that both should be "for each model"?

I have been working with DESeq2, EBSeq and RNAprof up to now. However, I would like to understand this tool. Ideally, I would like to compare them based on RNA-Seq data counts (ROC/AUC , F-Score, ...)

Thanks in advance !

Edit I have met someone who answered my questions pretty well:

MMDiff is based on a Bayesian approach. In this case, we are wondering if our RNA-Seq samples are differentially expressed between two conditions.

This implies two models : the first one says "Our conditions are not differentially expressed", the second suggests that "Our condition are differentially expressed". Those models refers to the H0 and H1, the two hypothesis we are used to manipulate in statistics.

The posterior probability in favour of the second model (column 3.) refers to the posterior probability that our conditions are differentially expressed. The closer to one, the higher (because it's a posterior probability, not a p-value!)

The intercept (alpha) corresponds to the reference level, the intercept. It is used with the regression coefficient of the covariate matrix (beta) to score the model independent method. Eta, the regression coefficient of model-dependent matrix, is, in our case, supposed to be extremely similar to Beta, due to the default matrix used by MMDiff.

Those default matrix are to be set before any model-dependent analysis. However, by default, only one is given, so only one "Eta" column.

RNA-Seq Statistics MMDiff MMSeq • 2.2k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.3 years ago by Thibault D. ▴ 700