Forum:Which programming langauge shall I start as a beginner in Bioinformatics
5
0
Entering edit mode
2.3 years ago
nkg.bb • 0

I am new to this bioinformatics and very much interested to develop tool for single cell RNA seq analysis. Though I have basic knowledge about C and Java, I am keen interest to know which language is mostly required for developing tools. Some of my friend and professor suggested me to focus on FORTRAN. while few people suggested me to start learning python and R. Would be glad if people working in this field suggest me.

language programing single cell Forum • 1.1k views
3
Entering edit mode
2
Entering edit mode

4
Entering edit mode

It is all relative. Most of us reveal our own backgrounds by calling something best or most important rather than providing objective assessments of the field. I chuckled the other day when I read on this site that some consider the author of Seurat to be among the most influential bioinformaticians ever. This is not personal, as I do not use this tool, nor do I know the author. My reasoning, boiled down to a single sentence, is that a tool that has been in existence for less than a decade, and is used by a relatively small subset of people, can't possibly be more influential than a general tool such as BLAST. Still, I couldn't tell you with certainty whether that makes me objective or unappreciative of the tool I don't use.

0
Entering edit mode

On the other hand, there are probably people who have only used Seurat now (or at least mostly Seurat). In their world, that is the most popular tool. They haven't even heard of BLAST. They might be "wrong", but can you blame them?

0
Entering edit mode

They might be "wrong", but can you blame them?

I concluded the post by saying: "Still, I couldn't tell you with certainty whether that makes me objective or unappreciative of the tool I don't use."

I have no desire to blame anyone for anything.

2
Entering edit mode

I am new to this bioinformatics and very much interested to develop tool for single cell RNA seq analysis.

Carefully consider if the tool you intend to develop is going to serve a need that is currently unmet in scRNAseq world? There are pre-existing scRNAseq tools that are already widely used. If you are just starting out (and are going to be the sole developer) then it may be an uphill task to make/come up with a breakthrough. You will want to be armed with knowledge of state of the current art before you decide to take the leap. Good luck!

1
Entering edit mode
1
Entering edit mode

if you care about your end-user's happiness and sanity, then stay away from any language that requires an interpreter or runtime. Managing Python and R libraries is by far one of the worst things about working in Bioinformatics. And containers do not make it much better, just means you now need to tote around an entire operating system + container runtime just to run your tools. If I was going to start building a tool for others to use (and not just a script), then I would stay far away from both Python and R. I am looking forward to the day when static binaries become the status-quo for tool distribution (think Rust, Go)

3
Entering edit mode
2.3 years ago

Hey, in my mind, it depends on which area you want to be involved. Python has definitive advantages over R; however, the reverse is also true. If we consider text-based manipulation of large files, and pipeline development, then Python is a clear winner over R; on the other hand, if we think more about data visualisation and statistics, the clear winner is R. R is also good for creating end-user applications now via R Shiny, but Python has this covered, too.

Fortran has a base in bioinformatics and, in fact, R itself is programmed in C and Fortran. Starting out in bioinformatics, though, I would not recommend to anybody to start with Fortran, purely based on the fact that coding in Fortran is absolutely not necessary to forge a career in bioinformatics.

As you mentioned your interest in single cell RNA-seq analyses, I cannot see past recommending that you start with R.

Edit: to give you an idea of my own career: I branched into bioinformatics from a wet lab and comp science background. JAVA and Visual Basic were my strongest languages. ~90% of what I now do is done in R. If I need to work with large text files, I have enough expertise in BASH / shell scripting such that I never have to use Python.

Kevin

3
Entering edit mode
2.3 years ago
Mensur Dlakic ★ 20k

Some of my friend and professor suggested me to focus on FORTRAN.

I don't mean to sound harsh, but I wouldn't want to have friends like that :-)) While FORTRAN is still in limited use, and oddly enough even COBOL, you will not improve your job prospects in Bioinformatics or make it easier to use other tools by learning FORTRAN.

As to the other languages you mentioned, C and Java are definitely useful in Bioinformatics. If you are developing tools for your own research, you could stop at those two languages as you will likely be able to do most things. If you want to distribute your tools and hope to attract wide audience, then it will probably come down to python and R. I regularly use python and rarely use R, but would still not hesitate to recommend R because many tools that exist in python are also available in R. My empirical (and non-scientific) observation is that python is used by more people in Bioinformatics than R, though there are certainly areas where either one of them is better than the other. For your area of interest, my non-scientific observation is that R tools are more developed and therefore more widely used. That may spurn you to learn R to match the existing audience, or to learn python and fill the relative need in that area.

3
Entering edit mode
2.3 years ago
Juke34 ★ 7.2k

Nobody has mentioned Perl...This broke my heart ^^

2
Entering edit mode

Some people would contend that Perl is out there to break hearts ¯_(ツ)_/¯

1
Entering edit mode

We need bioinformatics people now more than ever. Let's not scare them away.

1
Entering edit mode

The kids don't appreciate the classics.

2
Entering edit mode
2.3 years ago

I am keen interest to know which language is mostly required for developing tools.

Since your question is focused on tool development, some use Python to prototype or just get scripted work going, but toolkits that provide performance or memory efficiency improvements will almost always be written in C or C++. Python itself is written in C. Some portions of tools that lend themselves well to further optimization might be written in assembly, embedded in a larger C codebase.

If you are writing wrappers around other tools, then bash/Python/R are probably good languages to know. If you are writing core utilities, then C or C++ will probably give greater return on investment. If you are writing tools to do molecular simulations or mass spectroscopy, then FORTRAN might be a more useful, if domain-specific speciality, but it doesn't sound like that is your focus.

2
Entering edit mode
2.3 years ago
igor 12k

I am new to this bioinformatics and very much interested to develop tool for single cell RNA seq analysis ... I am keen interest to know which language is mostly required for developing tools.

If you are already know you want to get into scRNA-seq analysis, you should learn about the tools that are available. That will tell what kind of tools are available.

You should use some of those tools. That will give an idea of what those tools exactly do and what improvements are possible.

You can then check the source code for those tools. That will tell you what languages people are using. There is probably a reason why the authors of those tools picked that language. It may not be a very good reason, but they have more experience than you, so you should take their advice and use that language yourself.