Tutorial:Simple approach to RNA-seq (T-Bioinfo platform)
7.4 years ago
elia.brodsky ▴ 340

RNA-Seq can be confusing and frustrating for those that are just wanting to get the job done and worry about how it was done later (when publishing the paper...). Our platform, T-BioInfo (that is in Beta stage) is developing a simple interface to use for RNA-Seq analysis. You can see the tutorial in this video:

alignment next-gen-sequencing RNA-seq • 2.3k views
What's the benefit of this over Galaxy?

One obvious one that I can think of is that only a bioinformatician can use Galaxy - on this platform you can just click "differential gene expression" and it will build the pipeline and explain each step of the pipeline. No knowledge of data inputs is needed. Pipelines are easy to reproduce or analyze multiple samples at once. You also have a consistent interface that covers NGS, Mass Spec, Structural Biology and Data Mining - everything from RNA seq to CHiP seq to Proteomics and Screening of Libraries of Small Molecules.... I think there are quite a few differences...

No knowledge of data inputs is needed

I don't believe that's actually a plus. From what I've seen from video, I'm unsure how a person without good knowledge of RNA-Seq protocol and analysis would for example handle strand-specific data. All those "single-click" solutions will only work in settings with fixed and optimized protocols. For a sequencing run in local facility it is far better to consult a bioinformatician who really knows all the caveats coming from it. In case of commercial sequencing the company will likely to offer bioinformatic analysis for an additional fee (although it may cost more than your solution).

By the way, it would be great if you'll attach a link T-Bioinfo portal so anyone can try it. I've found this one http://tauber-data2.haifa.ac.il:3000/ by googling it but don't know if it's correct.

Thank you for your input, Mikhail The link you gave is for the academic server and you will need an account to access the platform. We are in beta right now and if you have a good project we could teethe system on, I would be very happy to discuss. If possible, could you also complete this short survey so that we can better understand what challenges are a priority for bioinformaticians like yourself? https://goo.gl/nBzXmU

No, very few bioinformaticians use Galaxy, the whole point is that non-bioinformaticians use it. I share Mikhail's general concern with push-button solutions for things like this. At some point people need to know enough about their experiments and how they'll be analyzed to know if just pushing the button will produce junk or not.

One important aspect to all this is that we are still figuring out what is possible and what is not possible to do reliably. And that is one component of the a "baggage" that comes with running a tool like tophat. It is extraordinarily easy to misuse these tools from the command line - and it is inconceivable that we could reliably train large number of nontechnical users on how to do it.

There will be applications where push button solutions are a valid approach. The phrasing that elia.brodsky uses is a bit misleading because it oversimplifies the goals a bit.

It is not that people don't need to know what the data inputs are - what they shouldn't need to know is what the FASTQ encoding is, all they need to know that some bases are less reliable. And they should not need to know what a sorted or unsorted BAM files are, or which SAM flag or tag tells us that there are multiple alignments for a read, or the what the exact order of parameters are or where the comma goes etc. these elements just add up to these gigantic overheads that actually keep science from progressing. That's really what people mean when they say you don't need to understand the details. And they are right. We should not need to know how a fuel injector works to drive a car.

Biologists are actually very capable of abstract thoughts and handling complex issues - just not in the command line itself because nothing else ever requires them to use that.

That is precisely the point, we try to prevent the very basic misuse and educate the user along the way.

Devon, I agree, analysis should be done with understanding and that is why we introduce an educational side to the platform.One of the main barriers to start using a solution like this is that in order to start one needs to learn everything there is to know about file formats, statistical analysis, data mining and other concepts that are not clearly related to the biological subject. We introduce a brief pop-up with an image and text that explains each step of the way with links to articles (about the algorithm), tutorials and sample projects. Our goal is to get the user involved in "action" and after they see it works, to start explaining key features. We are open to suggestions, so if you have any ideas on how to "democratize" and involve more people in bioinformatics analysis, please let me know! In addition, we are trying to compile responses from people involved in bioinformatics to help us address the relevant issues, please take a look: https://goo.gl/PljpFl

As promised, here is a preview of the platform in "advanced mode" showing the educational pop-ups that will explain basics of each step as well as limited choice selection that helps identify the right continuation for a pipeline that was started.

I do like to see improvements in the field and tools that facilitate analysis are always welcome.I thought the example was pretty neat - of course the real test of these tools is always what is the cost to run them?

Once just to see where the field is I have evaluated a similar tool from a different company. It seemed kind of neat, I would have actually used it myself - but then the sales rep told me the cost of running a sample at an educational discount will be $150 per sample! It was actually a funny exchange because he said "one fifty".To which I said ok, you've got to be more specific because try as I might I can't figure out what you mean. I know that$1.50 is waay to little, you couldn't run a company on that, but then I also think that asking $150 is just as ludicrous on the other extreme. I see that as a complete disconnect from reality we might get dozens of samples per experiment. But no, as it turns out it was$150 per sample, \$300 for commercial entities.

At which point the whole thing became moot, considering the number of replication that a typical RNA seq experiment has, and that you may want to rerun it etc. At those rates two-three runs per month would pay for computing and a bioinformatician's salary. Nothing is easier to use than hiring someone to do it for you. If software costs more than a hiring a qualified personnel then it is not a good start.

Anyway, that's may take an easy to use tools. I am somewhat cautious.

Thanks for your thoughts, Istvan. I agree the cost has to be factored in - our goal is to create something that does MAKE SENSE - in terms of application as well as cost. What do you think would be a good price for a system like that? If you would like to get access at the early stage (free), maybe we can work out some arrangement? One request: could you fill out a little survey about the bioinformatics needs that will help guide our platform development? https://goo.gl/nBzXmU

Pricing is among the most difficult things to get right so I am not going to speculate on that.

What I do know is that what made their proposal completely untenable and unrealistic for me is them proposing a per run/sample pricing. Research is about investigating and evaluating the unknown - the last thing I want is to have the specter of a price tag hinder our ability to do something again or differently.

At that point it won't even matter if it is a lot cheaper. I simply do not want to have a price tag on every thought process someone might have. That is damaging to the culture of discovery and trains people to think the wrong way, there is no point in having many tools at ones' disposal if you can't even use them.

This type of pricing should be all-inclusive that is provides a fixed capacity and lets people use it, then it grows with the needs of computing as a whole but and not by run/thought process.

Great point, I see many companies charging for "runs" or storage and we see from our experience that most researchers need to "play around" with data before a clear approach is prepared, so maybe a monthly price for casual users and an organizational price for a whole platform makes more sense - one installation with unlimited use. I am very interested in your experience, one request: could you fill out a little survey about the bioinformatics needs that will help guide our platform development? https://goo.gl/nBzXmU And if you would like to connect, I would be glad to give you a demo of what we have to hear your input. Thanks again!

The problem with your survey is that you are asking me to work for you, give you something valuable, my best ideas and thoughts but it is wholly unclear in what way does that benefit me. Amusingly, and also an indication how the road to hell is paved with good intentions your survey starts out on the wrong foot, by being bossy and demanding. It is pushing in my face a "red asterix", hey look buddy, this field is required!

No actually, you got that all wrong, that field is not actually required at all. I won't even bother filling it out see how NOT required the whole thing is?

I post here because I like to connect with people, discuss things etc. it is the freedom of choice and expression that I like. I really don't like going to work for others, posting to a place that may no one see etc. this is one reason I never give advice via email either.

I understand your point, and of course it's your choice to fill it out or not, but let me explain the benefit, maybe not personally to you, but to the community that is built around your area of expertise: as a new user, one might have a limited appreciation of something others like yourself see crystal clear simply due to the time they've spent doing bioinformatics. Once we collect the data, we intend to publish it and make it available to the whole community, but we also intend to create a learning environment for new bioinformaticians to have as they learn. Even though we are a for-profit company, we are working with academic partners that will use the survey results to make their classes more efficient and with a wider appeal. Contributing to their understanding will benefit the whole community as more people are growing in their understanding and appreciation of bioinformatics and are able to use bioinformatics tools to address research, biomedical and pharmaceutical challenges.

Istvan, I know the survey asterisks turned you off, but I just wanted to point out the "advanced" mode where you construct your own pipelines - our understanding is that it would help users learn as well as give them more flexibility. Let me know what you think: