Forum: Why learn programming in bioinformatics?
5
gravatar for joselu
5 months ago by
joselu60
joselu60 wrote:

Hello. There are programs of all kinds for online bioinformatics analysis (blast, alignments, dotplot, phylogenetics, genetic diagnosis, etc) so I would like someone to explain why it is necessary to know some programming language (python, perl, C +) in the field of bioinformatics. What tasks can we perform through a programming language that can not be done in a program that we have online? Can you give me some examples? Thank you very much.

programming forum • 677 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by joselu60
5

enter image description here

ADD REPLYlink written 5 months ago by Pierre Lindenbaum116k
2

Web sites and GUI tools are very hard to run in an automated fashion. Once you have to do something tedious or error-prone more than once, you'll want to learn some programming skills.

ADD REPLYlink written 5 months ago by Alex Reynolds27k
1

That totally depends on what you do. If you have a few Sanger sequencing reads and want to align them to a reference, go with an online tool providing a nice graphical interface. If you want to analyze a cohort of patients using whole-genome-sequencing and matched RNA-seq, good luck even downloading and aligning that stuff without some scripting skills. What is the field you aim to enter?

ADD REPLYlink written 5 months ago by ATpoint13k

Thank you very much for the answers, but there are already tools to change formats or clean sequences. Can you specify a little more the examples in which it is absolutely necessary to program in some bioinformatics task? Thank you

ADD REPLYlink written 5 months ago by joselu60
3

Have you even read the answers?

ADD REPLYlink written 5 months ago by WouterDeCoster36k

Yes, I have read them, but for the moment none convinces me.

ADD REPLYlink written 5 months ago by joselu60
1

Then you are dooming yourself to doing some things the hard way, or not being able to do them at all. shrug

ADD REPLYlink written 5 months ago by Alex Reynolds27k

They're all theoretical in nature, so yeah, not convincing to someone that has not run into issues that need programming. You have to think of programming as implementing solutions to the problem at hand. Problem solving is the critical skill, it is the way of thinking. Programming is to problem solving as speech is to thought - you can think all you want, but unless you know a language, it is impossible to express those thoughts. Similarly, you can arrive to a solution, but you cannot implement it without a programming language. So if automation, reusability and customization are not enough to convince you of the advantages of programming, I'm afraid we cannot help you. These are everyday tasks to most of us, and that's kinda what we use programming for.

ADD REPLYlink written 5 months ago by RamRS20k

I have no doubt that knowing how to program is fundamental to solve some bioinformatic problems, but I know it because everyone says it, but that is not enough for me, I need some more concrete examples explained for a beginner. Thank you very much.

ADD REPLYlink written 5 months ago by joselu60
3

Here's a real-world example: given 3.6M genomic annotations representing patterns of DNaseI hypersensitivity signal (or other signal, motif occupancy, conservation, etc.) across 766 cell samples, find all annotations over the genome with the same or similar pattern (within bounds of high Pearson r correlation, say) in under a second. There is no web app that will do this for more than one subset of annotations.

ADD REPLYlink modified 5 months ago • written 5 months ago by Alex Reynolds27k
2

Here's a simple example: Given a list of FASTA IDs (in a custom order, with or without repeating IDs), pick FASTA entries from a larger transcripts file where the ID part of the header matches (the description part can be ignored) in the same order as the IDs file and with as many repeats as found in the IDs file. Do this for 200 ID files across 10 transcripts file, and any error done must be consistent and reproducible. Oh, and this needs to be done in under an hour the first time, and subsequent times I need for it to happen in under a minute.

ADD REPLYlink modified 5 months ago • written 5 months ago by RamRS20k
1

I think it's unfair to require the answer in an hour the first time, and a minute on subsequent runs. We only need the answer "this week". So let him try to perform the task by hand. It will take several days, and when he's done, it will have to be double-checked, because no repetitive task a human works on for 20 hours is going to be error-free. Add a few more hours for a manual screening for accuracy. By that time your boss will change his mind on which FASTA IDs he wants, and you have to start over.

ADD REPLYlink written 5 months ago by karl.stamm3.4k

Now I understand much better the importance of knowing how to program in bioinformatics. Thank you very much for the answers.

ADD REPLYlink written 5 months ago by joselu60
1

No tool will cover all possibilities. If your data is in a non-standard format or supposed to be in a standard format but contains formatting errors, these things will have to be fixed on a case-by-case basis as it is unlikely that existing tools will be tailored to your specific issue. Another example which I already alluded to is to extract and combine data from multiple sources.

ADD REPLYlink modified 5 months ago • written 5 months ago by Jean-Karim Heriche18k

Please use the 'add comment' button to reply to an answer. This keeps things organized and prevents the question to appear as answered when it is not.

ADD REPLYlink written 5 months ago by Jean-Karim Heriche18k

If you, in research, are doing exactly the same as someone else already did then you are not going to find anything new.

Research constantly finds new requirements for bioinformatics, so new scripts are necessary, or older scripts need to be improved.

ADD REPLYlink written 5 months ago by WouterDeCoster36k

The General Problem

ADD REPLYlink modified 5 months ago • written 5 months ago by h.mon23k
2

A real programmer would design the program to pass anyone any arbitrary condiment.

ADD REPLYlink written 5 months ago by RamRS20k

Just start doing some bioinformatics and you'll realize why.

ADD REPLYlink written 5 months ago by urjaswita60
8
gravatar for Jean-Karim Heriche
5 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche18k wrote:

Ready-to-use tools (online or not) won't do everything for you. For example, a tool would only accept data in a particular format and your data is structured in a non-standard way that no tool knows about so you'll have to write a conversion tool yourself. Or you need to clean up your data before processing it otherwise the processing tool would crash or give incorrect results, for example remove all the samples that are not of type X when the type information for each sample is spread across different files or possibly in a database. You may be able to get by without programming in your current job but what about the next one in which the project will require you to perform tasks/analyses for which no ready-to-use tool exists or you need to add functionality to existing tools or you're asked to build one of these online tools. Nowadays, it's not just bioinformaticians who need to have basic programming skills, every biologist should. It's simply not scalable to have a computational person behind every biologist producing data. Every biologist should at the minimum be able to clean up and reformat their data, extract relevant information from large data sets or databases, automate basic tasks, compose workflows using available tools and be able to run simple and standard statistical analyses. This is how it works in other scientific areas so why not in biology.

ADD COMMENTlink written 5 months ago by Jean-Karim Heriche18k
6
gravatar for venu
5 months ago by
venu5.8k
Germany
venu5.8k wrote:

This might not qualify as an answer, however I'm tempted to write one as something similar discussion happened today.

A guy sitting next to me is a pro (good) in excel. I'm not (but I'm good at some scripting and command line tools). This guy said excel is easy to calculate some basic number like mean, sum etc. I said, IMO, unix command line tools are far better than excel.

We started a timer to check who calculates some summary value using a text file. He opened the file in excel and selected a column and came up with the answer. He won. No worries. Mean while I figured out how to do it with a simple combination of unix commands.

Interesting part is, now we have a folder with ~100 files to repeat the task. And I went out for a coffee running 3 lines of code I figured out and this guy is opening and closing excel files.

This is exactly what @Pierre's graph summarizes. It's not mandatory to learn programming unless one wants to be (more) productive in a given time compared to a non-programmer, especially in bioinformatics.

ADD COMMENTlink modified 5 months ago • written 5 months ago by venu5.8k

GNU Datamash will save you eons of your time then! It's an amazing piece of work.

ADD REPLYlink written 5 months ago by harish140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1786 users visited in the last hour