Question

Another statistical model to estimate gene/transcript expression

0

Entering edit mode

6.6 years ago

piecu91 • 0

Hi All,

As as short introduction - I'm new to bioinformatics/gene sequencing, but I have background in mathematics/statistics. I did some basic reading in molecular biology, NGS technology and about steps in NGS data analysis. So, I'm planning to apply for funding for my PhD project, which is about developing more accurate statistical model/s to estimate gene- or transcript expression levels in the presence of shared reads and shared exons. I'm aware of the existence of such methods like: Salmon, eXpress, RSEM, Kallisto, Sailfish etc, but my potential PhD supervisor has couple of novel ideas that may work and may improve the current state of art. Sure it won't be easy, because these methods are already good, but this topic could be a good challenge for PhD.

1) In my project proposal, I need to describe precisely the economic impact of the project results on pharma/biotech industry. I could say that more accurate estimates, accounting for shared reads & exons, may lead to better understanding of mechanisms of diseases on molecular level and this eventually may results in developing new diagnostic test & medicines. Though, I think such reasoning is not enough. Do you have any ideas how to be more specific ? I need to convince jury with some real arguments. Should I provide some numbers/statistics ? I have in my mind mostly one application - Differential Expression, but maybe I should mention other applications ?

2) Some of you work in pharma/biotech private companies, do you think there is a need/space for other statistical methods which deal with mapping ambiguity (shared reads & exons) ? Perhaps Salmon is already good enough and no need to waste energy on new models?

3) Based on your experience, do shared reads & exons have serious impact on a data analysis ? Or you could just discard multireads most of the times ?

I don't have experience in this field, therefore your opinions, answers will be very appreciated.

multiple reads shared exons mapping ambiguity • 1.5k views

ADD COMMENT • link 6.6 years ago by piecu91 • 0

score 0 · Answer 1 · 2017-09-08

In my gut feeling this topic is well studied and any improvement would have a little effect on biological insights you might gain from an experiment.

I think that the area of the microbiome should be more fruitfull but unfortunately I'm not following it closely enough to tell if there are good tools out there. If you would be able to apply a new statistical model for differential abundance of bacteria I think it would have a huge impact. You can think of using several sources of data, not only 16S abundance, even suggest a simple measurement that will help normalizing the data and that would be even better.

Good luck!

score 0 · Answer 2 · 2017-09-08

@Asaf already mentioned that the topic is well-studied and I agree that little improvements will have little impact. In particular if you have to demonstrate economic impact in your proposal.

Industry likes de facto standards and well-established and often-used methods. I work in a CRO with all kinds of pharma being our clients. It is important for us to use the most established and documented tools to enable reproducibility by our clients.

As a scientist, I agree that better/newer methods should be tested too but to introduce a different/novel method in a pipeline is a lot of effort. For example, we must ensure that the results do not contradict previous results. If results change, we have to do a lot of quality controls (compare with historical data and other technologies). This is way to much effort for just little improvements.

Again, from a scientific point of view, I don't like it but it is how (most? / our?) industry business is run: without the prospect of an increase in revenue it is hard to get approval for R&D. Even a medium improvement on transcript abundance estimation accuracy will not result in a big increase in revenue! Keep that in mind if you argue with economic impact.

By the way: In the last two years, I had just one client being interested in transcript abundance. Other clients were interested in gene level quantification only - often just to the extend wether there is any expression or not.

score 0 · Answer 3 · 2017-09-10

I have two more doubts that came to my mind regarding the project proposal.

4) Do you think that nanopore sequencing technology (e.g. Oxford Nanopore), which allows to sequence whole transcripts, will (and when?) replace NGS technology ? How long could it take before nanopore sequencing become mature and credible technology ? I'm asking, because it's a "threat" to my project, which would focus only on NGS data.

5) Additionaly, in the project we'd like to adapt the developed models for transcriptomics into proteomics, because there is also ambiguity - peptides are shared between many proteins (just as exons and transcripts). Could it be also of interest for pharma/biotech companies or researchers?