Hi,
I am wondering which kind of tools should I use to infer predictive models from RNA-Seq data analyzing expression data and how it affects PPI network. The experiment I have in mind consists of gathering a cohort of samples with and without a phenotype. Use the RNA-Seq data to generate a sample-level PPI graph [basically pruning the PPI with coexpression values]
Once I generate the sample-level graphs, apply machine learning techniques to infer which part of the PPI graph behave differently between the two classes and use such a model to classify/describe new incoming samples.
I am sure there must be algorithms dealing with this problem but I am unable to find them. When looking for algorithms extracting networks from expression data or multilayered networks, I could only find tools that describe/analyze the differences in a cohort, without generating a model that I can apply on a new sample without reanalyzing a whole cohort.
Do you have any pointers to where should I look ?
Thanks,
Mattia
Maybe CGBayesNets? - it was developed by a guy with whom I used to work. In any case, what would be the input to your model? - hub scores for each gene?; or the betweenness centrality scores? You would have to bootstrap it all by eliminating 1 gene at a time in order to see how that affects the network structure. In that process, you could also infer which genes are most important (essentially given another metric).
Thanks Kevin for the pointer, CGBayesNets is a very interesting approach. I've been reading some papers about it and it is not clear to me if I can use it to evaluate my questions. I'll dig further.
Answering your question about what would the input be, ideally they should be expression values for genes / isoforms. My overall goal is to assess the relevance of isoform expression profile shaping functional interactions for some phenotypes.
It's a very green idea of mine but I was planning to use a PPI network from the literature as all the possible connections. - then predict interactions between all ensembl isoforms using sequence-based predictors - Then, for each sample independently, prune the PPI network based on expression levels of interacting nodes (using as expression only the isoforms I predict as being interacting with each other) - Once I have generated this, find some graph analysis tool able to produce a prediction model (e.g. GAM ) - Finally see what the model tells us, if it agrees with literature + assess the predictive power.
Once I've done this with Isoform expression levels, I'd do it with total gene expression levels and I hope to find less interesting / predictive stuff.
That being said, I imagine I'm not the first one wondering about the relevance of isoforms profiles on functional interactions, but I am unable to find algorithms suitable for this task. Do you think CGBayesNets could be used to infer a phenotype manifold that makes sense if I feed it with isoform expression levels? Would it be comparable to the one using total gene expression as input ? To me it seems that it's not possible to jointly consider the total expression level + isoform level in one single run (somehow losing the link between isoforms of the same gene by treating them independently).
Hey, sorry this is a little bit out of my area, and I think that we may be struggling with the limitations of language, i.e., it may be difficult for anybody to write out perfectly in English or any language what are their thoughts.
I am sure that a network of isoforms would be quite different from a network of total expression (total expression of the isoforms). Biologically, what does total expression of isoforms even mean? - 99% of us ignore isoforms and use total expression, but that is biased. On the other hand, it is simply still impossible to understand the function of each isoform. For many genes, we do not even know if they have isoforms or not... There are undoubtedly hundreds of millions of low abundance isoforms that are only expressed under very specific cellular conditions / states. It may not be the job of your project to deal with these issues, though.
If at all possible, you could attempt to group isoforms together based on common tissues in in which each is know to be expressed, and build a separate network for each.