Forum: Will Python Take The Place Of R?
gravatar for Medhat
7.1 years ago by
Medhat8.7k wrote:

Day after day more statistics packages are added to the Python language: "RPy , Statsmodels , StatPy etc...", making it more countable and robust in the field of statistics. But will it take the place of R?

by the way this is not comparison at all or R vs Python geeky talks, what I am discussing her is that the investment in the statistics package production in Python will lead that python by its own with out R be sufficient to bioinformations needs? also shall we need to invest more time in statistics package in Python or as some answers here shows that no way we will not get the perfect combination "scripting + statistics language" in Python ?


the post is old but today I found this article which also helps a lot 

Choosing R or Python for data analysis? An infographic

Python overtakes R, becomes the leader in Data Science, Machine Learning platforms

ADD COMMENTlink modified 2.6 years ago by jared.andrews076.1k • written 7.1 years ago by Medhat8.7k

[R] as a language has its own syntax. As many people like R syntax I doubt that python will replace [R].

ADD REPLYlink modified 7.1 years ago • written 7.1 years ago by Zev.Kronenberg11k

ok but the new generation whom did not learn R will not need to learn both R and Python i think they just will need to learn Python

ADD REPLYlink written 7.1 years ago by Medhat8.7k

Python is great, but its functionality (and the functionality of other scripting languages) does not completely overlap [R]. Learn both. If you hate R, learn Octave, Matlab, or Maple. Or you could implement and validate complicated statistical algorithms yourself.

ADD REPLYlink modified 7.1 years ago • written 7.1 years ago by Zev.Kronenberg11k

I think the last part is bad advice. Do not implement statistical algorithms yourself, not even the ones you think are reasonably straightforward. The libraries that are there have been tested more extensively than anything you write most likely ever will, i.e. it will have no or very few bugs.

ADD REPLYlink written 7.1 years ago by Michael Schubert7.0k

Some statistical algorithms are simple. It is sometimes a good idea to reimplement them, for better understanding the pitfalls and for dropping unnecessary dependency. In addition, most 3rd-party libraries are written by fellow programmers no better than you and me. They make mistakes, mistakes that can exist for years because we all take them for granted. My favorite example is matrix multiplication in Ruby and a few other programming languages (e.g. Clay). It is widely known that the second matrix needs to be transposed for better cache efficiency, but in the Ruby library, the developer uses the much slower method. I would think such an obvious flaw should have been identified much earlier, but it is still there because no one bothers to read the source code (EDIT: or because no one implements matrix multiplication these days to learn the transpose trick). Except a small fraction of really high-quality or widely used libraries, most others are not trustworthy.

ADD REPLYlink modified 7.1 years ago • written 7.1 years ago by lh332k

My personal confession here is that every time I try to implement a statistical method and I read more about it I realize that I do not know how to do it correctly. Or what correct actually means, or wether I should even worry about it. What I mean here that even the simplest task can become a little scary if one wanted to generalize. An example: what is the right way to compute something as simple as a sum.

ADD REPLYlink written 7.1 years ago by Istvan Albert ♦♦ 84k

The more we implement, the clearer we are about correctness and the more likely we can apply similar techniques to complex practical problems. For those who work on method development, reimplementing simple methods is very important. In addition, if you do not know how to implement sum correctly, many others will not know, either; some of them may be even unaware of numerical stability at all.

ADD REPLYlink written 7.1 years ago by lh332k

I was being cynical :-). Michael Schubert is correct.

ADD REPLYlink written 7.1 years ago by Zev.Kronenberg11k

I really hate R's syntax, but as long as Python doesn't have an equivalent of ggplot2 to create nice plots, I will stick with R for most of the data visualization tasks.

ADD REPLYlink written 7.1 years ago by Giovanni M Dall'Olio27k

The KDNuggets site is so ugly it is surprising one can read an entire article there - anyway, its poll is biased and not of much value, I would think.

And are those charts from the linked article Excel charts??

ADD REPLYlink written 2.6 years ago by h.mon30k

:D for sure it is Excel

ADD REPLYlink written 2.6 years ago by Medhat8.7k

Excel is the future of bioinformatics

ADD REPLYlink written 2.6 years ago by russhh5.4k

R is principally utilized for measurable examination while Python gives a progressively broad way to deal with information science. R and Python are conditions of the workmanship as far as programming language situated towards information science. Learning them two is, obviously, the perfect arrangement.

ADD REPLYlink written 10 months ago by manjulam.besant0

What is measurable examination?

ADD REPLYlink written 10 months ago by ATpoint36k
gravatar for Istvan Albert
7.1 years ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

No, nothing of sorts could ever happen.

  1. R has more advanced statistical functionality than Python will ever have - the packages that you list implement a tiny subset of what already exists in R
  2. R has better visualization capabilities than Python will ever have
  3. R has a better cross platform compatibility than Python will ever have
  4. R has better automated package installation than Python has (and likely will ever have)
  5. The userbase for R as a statistics language is gigantic compared to the number of users that use Python for data analysis

The downside of R is that it is both eclectic and byzantine.

Python is a generic programming language and it is great at that. But it is not a data analysis platform nor are the lead developers focusing on addressing the issues above.

And I am saying this as someone that uses Python almost exclusively for data analysis and most of my work.

ADD COMMENTlink modified 7.1 years ago • written 7.1 years ago by Istvan Albert ♦♦ 84k

1b. Bioconductor:

ADD REPLYlink written 7.1 years ago by Alastair Kerr5.2k

+1 I agree completely with the list, I consider R a must-know language for any bioinformatician.

ADD REPLYlink written 7.1 years ago by JC10k

This post is bad, and the fact that it has so many upvotes is bad. 

First, right off the bad when someone is making sweeping statements about the future "will ever have", it's a bad sign. It's almost impossible to predict what the future will hold. There's no fundamental reason why Python couldn't be better in every category listed in 10 years; unlikely perhaps but possible.

Second, this person has a completely different definition of "Python" in the context of "Python vs R" then... well, the rest of the world. From several comments in the original post and in follow-ups, they clearly don't regard pandas, numpy, etc, as part of Python for the purposes of this comparison. This makes no sense, but I won't even go there. The question that everyone is interested is in whether the Python numerical/statistical "stack", if you prefer the term, can replace R. So this is really seems to be answering a different question.

Third, some of their statements are highly subjective. If 'statistical functionality' is exclusively algorithms produced by the statistics community, then the statement is closer to being true. But R is not the language of choice in machine learning, electrical engineering, physics, etc departments. All these fields make substantive contributions to statistical functionality - in the case of ML I would argue their contributions to practical statistics is pretty comparable to statistics. Yeah, R has great visualization. Quick question: how can I zoom in and out of a plot I make in R? Oh right, it's pointlessly difficult... something that's a complete joke in python and matlab. 

Some reasons Python might take over R:

  • No answer from R to ipython, which is probably the best tool for combining code, graphics, and explanation around
  • Far better grid computing functionality through ipython and also through sockets and shareable data structures built into Python's standard library
  • Far better ability to bind with other languages. My personal favorite is the incredible power of boost.python.
  • Rpy2 lets you call any R package directly from python, possibly largely negating R's only real advantage (it's larger library)

While I think it is unlikely that Python will replace R anytime soon (R is just too entrenched in the stats community), I really hope it does. R is a messy, garbage programming language and I'd just as soon never touch it again.

ADD REPLYlink written 6.0 years ago by quicknir70

As Yogi Berra said: It is difficult to make predictions, especially about the future. Thus when talking about the future one needs to look in a realistic context and not an absolute one. 

I for one (and I assume many others) interpret the original question of Python replacing R with respect to the future of a reasonable human timeframe that would affect an individual's work and career in bioinformatics! Say 10 years - that is a very long time especially in Bioinformatics where The Dog Years Of Bioinformatics apply. 

Just consider that Python 3.0 was released in 2008 and six years later its penetration into science (and in general) is still extremely low and most specialized software may never (time frame as above) be ported over to it. 

Python 2.7 is already more popular then R in absolute terms, it is more useful than R in countless of application domains. Plus I fully agree that R is a on old and crufty language with a flawed design. Still when it comes to data biological data analysis where statistics are an essential requirement, it won't be replaced by Python any time in our career time. 

ADD REPLYlink written 6.0 years ago by Istvan Albert ♦♦ 84k

If I may add my testimony from 3 years later: In the last 3 years, I completely switched from python 2 to python 3, I learned a bit of R, used it for a few months, and, given the pain it was to do some programming with it, I finally decided to learn pandas and rpy2 to avoid having to deal with R as much as possible. I would say that the bigger problem is that I still need to use the R interactive interpreter to experiment with bioinformatics packages before managing to run them from rpy2, and also when I want to get help about those packages. I would be much happier if a python version of them was available. If I had time to learn new languages, I would rather try scala and julia than investing time in R.

ADD REPLYlink written 2.7 years ago by blaise.li__biostars50

+1. R is not going anywhere anytime soon.

ADD REPLYlink written 7.1 years ago by Chris Miller21k

1 and 5 are obvious points, but 2-4 are debatable. R has better native support for stats plots, but that does not mean that all visualisations are better. Mind elaborating on the platform compatibility? IMO, setuptools is far more powerful than R's built-in software installer, but R's repositories are more integrated and centralised. In your comment you do not count numpy as python but Bioconductor is a point for R. I'm not saying that R is not the more obvious choice being a DSL and given the sheer number of packages available (it is!) but the answer seems a bit biased. I would also like to see an example for the "100 vs 1 line", I don't think that's true.

ADD REPLYlink written 7.1 years ago by Michael Schubert7.0k

From personal experience of teaching python programming to beginners - installing something as simple as matplotlib causes endless troubles to over half the audience. Some library or some dependency is always missing. Ironically the only platform that the installation is seamless is Windows but even there one needs to make sure to download the right library that is labeled exactly with the same version number as the python that is installed, plus makes sure to download the right binary version AMD vs Intel. And when it fails it means that this person has no visualization capability whatsoever.

The equivalent in R is install.packages("foo") and that's it, works the same way on all platforms.

There is a good blog post by Titus Brown: that also noted how python software installation is and I quote: "a total disaster". Even though they are using software pre-packaged by a company. The catch is that when you are good at it and notice problems early on it all seems to operate like clockwork.

ADD REPLYlink written 7.1 years ago by Istvan Albert ♦♦ 84k

I won't call matplotlib a simple package to install.

Also, in my experience, I have some trouble with R package that require some external dependency.

ADD REPLYlink written 7.1 years ago by Tg310

I am not as well versed with R as python. I can see that the available packages makes R indispensable as a statistical language. However, are there anything native to R that makes it ideal for data/statistics? If we are just comparing the R and python languages, are there syntactical advantages to R?

ADD REPLYlink written 7.1 years ago by Damian Kao15k

R has a large number of syntactical constructs that one can use to build incredibly powerful (and often maddeningly difficult to understand) constructs. Using the "apply", "by" and "sweep" combined with the automatic function broadcasting allows one to write one liners that do things that would take hundreds of lines of Python to implement. You can do similar things with NumPy or Pandas but note how that is not Python anymore, one would need to learn the NumPy/Pandas specific terminology that is radically different from Python and not all that different from R.

For an example of R at best see the Plyr pacakge:

ADD REPLYlink written 7.1 years ago by Istvan Albert ♦♦ 84k

linear algebra: syntax for multiplying matrices. Although I am sure there is a python lib for this.

ADD REPLYlink written 7.1 years ago by Zev.Kronenberg11k

Well, never say never ever:

ADD REPLYlink modified 6 months ago by RamRS27k • written 6.2 years ago by Roman Valls Guimerà530

Excellent points Istvan.

ADD REPLYlink written 5.1 years ago by R. Taylor Raborn290

I think 2 3 and 4 are plain false and 5 is questionable.

ADD REPLYlink written 4.8 years ago by Endre Bakken Stovner890
gravatar for Malachi Griffith
7.1 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith18k wrote:

Many statistics packages (and elaborate pipelines, APIs, etc.) are also still being added to R regularly.

Over the past ten years I have watched R grow from something only statistics professors used to a fairly ubiquitous tool being used by bioinformaticians, biologists, clinicians, etc. It seems to be under active development. R 3.0.0 was just released. It has a GUI version that works nicely in Mac or Windows and the text based session will work in Mac or Linux/Unix. I have encountered surprisingly few cross platform issues over the years.

With the addition of 'Rscript' a while back, it is now trivial to automate R analyses (including figure generation) in a pipeline. The barrier to entry as either an R developer or R user seems to be lower than ever, although R maintains a well deserved reputation for having a steep learning curve.

For all these reasons, I don't see R going away anytime soon...

ADD COMMENTlink written 7.1 years ago by Malachi Griffith18k
gravatar for bluewoodtree
7.1 years ago by
bluewoodtree60 wrote:

Python (with NumPy, SciPy, and StatPy) already has a big share among data analysis software. You can almost find equivalent functionality to Matlab/Octave and R, however those Python tools are still a little bit in their infancy. I mean, R and Matlab/Octave exist for many decades and have been originally geared to those data analysis functions...and of course they developed over the years to become even better. Python's data analysis capabilities are quite new, and it might take a while until they are on the same level, or become even better.

But I am very optimistic that Python will evolve to be one of the best data analysis packages one day. The Python community is very enthusiastic, creative, and productive, and in my opinion it is just a matter of time. However, I think R and Matlab/Octave will never cease to exist. They will find their niche, just like Fortran & Co.

ADD COMMENTlink written 7.1 years ago by bluewoodtree60
gravatar for Xingyu Yang
6.2 years ago by
Xingyu Yang260
Xingyu Yang260 wrote:
Search bioconductor. I don't think python have any chance to take it place.
ADD COMMENTlink written 6.2 years ago by Xingyu Yang260
gravatar for vaskin90
7.1 years ago by
Milan, Italy
vaskin90290 wrote:

If you implement a new bioinformatics/biostatistics algorithm I think Python gives much more flexibility in programming. It is easier to implement those algorithms in Python since it is a general purpose language and it has a nice syntax, lots of useful language construction. R is pretty bound to table data manipulations but the base of statistics algorithms in it is really impressive.

So people often use combination of R/Python (like here When they use Python for algorithm implementations, input/ouput manipulations and R for plotting, running statistics or for Bioconudctor packages.

ADD COMMENTlink written 7.1 years ago by vaskin90290

totally agree +1

ADD REPLYlink written 6.2 years ago by Medhat8.7k
gravatar for Carlos Borroto
6.2 years ago by
Carlos Borroto1.9k
Washington Metropolitan Area
Carlos Borroto1.9k wrote:

I believe Python will take over. It won't be easy as there is a lot convincing to do and algorithms to port. The no so secret weapons Python has are Pandas and the IPython Notebook.

Take a look at this video introduction and see if you agree with me.

10-minute tour of pandas

Feel free to follow on using the available notebook viewer.

ADD COMMENTlink modified 6.2 years ago • written 6.2 years ago by Carlos Borroto1.9k

Really I am using both now some times R become annoying but my conclusion is that they will live side by side :)

ADD REPLYlink modified 6 months ago by RamRS27k • written 6.2 years ago by Medhat8.7k

You have been proven very much correct

ADD REPLYlink written 2.7 years ago by Kevin Blighe61k
gravatar for pld
6.2 years ago by
United States
pld4.8k wrote:

I am not sure if anyone feels this way, but I have always had a serious issue with R. I realize python was purpose-built with the concept of "least suprise", but R seems to have been designed with the concept of "world's least consistent language". I've never had such a wrestling match with getting ontop of the syntax, especially with bioconductor where it seems like every package is a separate beast with its own syntax. At least to me, it seems like I get one bioconductor package down and that information serves no purpose in figuring out the next one. Vanilla R is one thing, but it seems like the groups and people who implemented their respective R packages made no effort to implement any sort of uniformity or consistency. I've seen some of the R code some staticians write and it would make even a CS101 student cringe.

If anything it is a case of the typical traditionalism seen in science that slows the adoption and migration to more effective methods. R is a big ol' kludge that people keep using because the last group used it.

I could be wrong but I've never had this experience with anything from C to Perl to Haskell.

Python's scientific computing (message passing, statistics, numerical methods, etc) is constantly growing but they're still a long way away from matching the level of features that R offers. SciPy and NumPy are great but it seems like growth there has slowed. They're also fairly bair-bones for doing some of the more complex stuff. NumPy is very fast, I'm really happy to see the use of vectorized matrix and vector operations. IMO BioPython is an overweight mess, but that is more about gripes with overzealous OOP.

ADD COMMENTlink written 6.2 years ago by pld4.8k

Bioconductor is an "organic" product that is a bit like a field tended by hundreds of gardeners, each with their own plot going on their merry way. Some plots flourish, many others are abandoned or produce incorrect or inconsistent results and there is little indication when that happens

Knowing R/Bioconductor is more about knowing what does not work and steering away from these latter. If I try to rely mostly on R after a week or so I get mad - it just does not fit my brain and the inconsitencies and lost opportunities irk me to no end. My defensive mechanism is to use as little R as possible and do all programming in python. Once the data is in a simple matrix like structure things work well.

ADD REPLYlink modified 6 months ago by RamRS27k • written 6.2 years ago by Istvan Albert ♦♦ 84k
gravatar for Chris Fields
6.2 years ago by
Chris Fields2.1k
University of Illinois Urbana-Champaign
Chris Fields2.1k wrote:

I don't think any one language will completely dominate, ever. There are too many people with too many varying opinions on what makes a programming language good or bad. It's akin to always choosing French over German cuisine, Buffalo Trace over Builleit, vi over emacs, etc. It's a personal, almost artistic choice.

And really, why would you want to know one language anyway, particularly in this field (which has a lot of Python, Perl, R, Java, JS, C, C++, Ruby, etc etc etc) floating around? Sure, one may have a favorite, but I would find it incredibly boring if there was only one choice.

ADD COMMENTlink modified 6 months ago by RamRS27k • written 6.2 years ago by Chris Fields2.1k
gravatar for Ann
6.2 years ago by
Concord NC USA
Ann2.3k wrote:

No, because of what Istvan said.

And also my own experience:

I used to use python all the time for just about everything. But then I got much better at R programming after reading lots of Bioconductor vignettes and also Bioconductor Case Studies. Once I learned how to use named vectors and lists like dictionaries, I pretty much stopped using python for anything except scripting. And then I learned how to use bash shell scripting more effectively and stopped using python even for that. That said, whenever I want to write a simple command line tool, I always write it in python because I can count on python being available on most systems.

However now that I've heard about the iPython notebook, I might start using python more often.

ADD COMMENTlink modified 6 months ago by RamRS27k • written 6.2 years ago by Ann2.3k
gravatar for R. Taylor Raborn
5.1 years ago by
Tempe, AZ | Biodesign Institute at ASU
R. Taylor Raborn290 wrote:

No- at least not for the foreseeable future. I echo the sentiments and good points of Ann, Istvan and Malachi and many others on this thread.

The chief reason is because of the existing ecosystem that exists in R not just for data analysis, but also for statistically-informed analysis of biological datasets. Python can't match what R has in place, which includes numerous packages that are being produced daily. The best example of this is the Bioconductor project, which is extremely well maintained and offers a number of biologically-informed data structures.

The second reason, which hasn't been touched on as much here, is due to R's capabilities for data visualization. R is unquestionably superior to python in dataviz, especially packages like lattice and especially ggplot2. I don't see R losing its lead to python in this area either. 

ADD COMMENTlink written 5.1 years ago by R. Taylor Raborn290
gravatar for Kevin Blighe
2.7 years ago by
Kevin Blighe61k
Kevin Blighe61k wrote:

I disagree with the entire concept of even comparing these two. They both have advantages in different areas. A skilled and experienced analyst will know the situations in which both are best applied. I will even throw JAVA, Perl, and C/C++ into the fray here - all excel in certain areas and the ideal situation is to have skills in all, which only comes with years of experience.

ADD COMMENTlink written 2.7 years ago by Kevin Blighe61k

That's all very well*, but it's hardly going to keep the flamewars burning

* and I completely agree

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by russhh5.4k

I'm going to throw a whole lot of fuel into this forest fire here -- even Excel has its place in a bioinformatician's toolbelt.

ADD REPLYlink written 2.6 years ago by RamRS27k

If you use it, be very careful:

When it isn't irreversibly corrupting datasets, Microsoft software also pollutes text files with line endings that trip up casual users of command-line tools.

ADD REPLYlink written 2.6 years ago by Alex Reynolds30k

One of my fav articles to point to, too. Bioinformatics is the knowledge of how to use various tools combined with the experience of when not to use them.

ADD REPLYlink written 2.6 years ago by RamRS27k
gravatar for jared.andrews07
2.6 years ago by
St. Louis, MO
jared.andrews076.1k wrote:

Well, years after the original question, I think this is worth a revisit. While I agree with Kevin's answer fully, it's interesting to see how python has progressed in recent years to catchup with R on the statistical and visualization side of things. As others have mentioned, numpy, pandas, and scipy yield a huge amount of flexibility to python in terms of data manipulation and statistical analysis. It's true that it doesn't have many of the purely 'omics-based packages that R does, but more and more are being ported to python.

Really, the languages complement each other very nicely, in my opinion. Data munging and handling common file formats is easy in python, particularly with pysam, pybedtools, pyvcf, and pandas, and Rpy2 allows you to access those powerful R stats/modeling packages from within python.

I feel confident in saying python matches R's visualization capabilities at this point in time. I've never had a moment where I felt I had to go to R to create the figure I want. The creation of seaborn and allow the creation of high-quality, interactive figures very easily without having to fiddle with matplotlib parameters much (if at all). Couple these with ipython and you've got some really interesting ways to explain and interactively wade through your data.

Python package installation has also come a long with with the advent of pip, anaconda, and bioconda. Similar to R, nearly every python package is a one-line install. This seemed to be a big complaint of many people 4 years ago, but it's been largely resolved now.

I don't see R going anywhere due to all of the packages made specifically to handle analysis of sequencing experiments, network interactions, etc. Python can do those things perfectly well, but why reinvent the wheel when you can pull it out and stick it on your own car whenever you need?

Overall, I feel python has established itself as an important player in bioinformatics for years to come. Part of this is due to its incredibly easy to pick up syntax, general flexibility, and extremely active developer base. I personally hate R as a language, but there's no denying its status as the backbone of statistical analysis in the bioinformatics community. Of course, that doesn't mean we have to interact with it anymore than is necessary or that python won't continue to make advances to help bridge the gap.

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by jared.andrews076.1k

I still think it's a matter of personal choice. No matter how much Python may "progress", so long as it uses the horrible white-space sensitive syntax, it will remain a horror to stubborn old school C-style programmers such as myself. Unless there is something worth the switch, python would not be my first choice for anything even moderately heavy-weight.

ADD REPLYlink written 2.6 years ago by RamRS27k

As a Java developer for me it was horrible at first look without semicolon and curly braces, but after a while I adapted (but still don't like the idea of spaces) and I was impressed by productivity level.

ADD REPLYlink written 2.6 years ago by Medhat8.7k

I mean, I can program in Python alright. I just prefer R because I don't have to jump through hoops to unwrap a dataset.

ADD REPLYlink written 2.6 years ago by RamRS27k

Oh, it definitely is. No reason to limit the toolkit. It's funny you mention the syntax, as I find bracket-based syntax infinitely more annoying visually and while writing. I get the advantages, but it just causes headaches for me.

I actually agree with your last point, I just find python a lot simpler to write. You can actually get the best of both worlds if you wrap C libraries and use cython to speed up the intensive tasks. R does the same thing with Rcpp. Things like the boost.python library and SWIG make it easy to wrap existing C code with python - see pysam/pybedtools. Plus pandas/numpy do quite well performance-wise. Again, just depends on preference. I know that learning how to do such things in python will take much less time than my trying to learn the idiosyncrasies of C/Java, so it's what I tend to stick with.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by jared.andrews076.1k

True - ease of learning is what gives the edge to Python. It's the reason why I have it on my tool belt too - learning it felt like a minimal effort thing. I am not against switching to Python, I just don't see the exclusive need for it, and I cannot actively find reasons to switch to it because of the barrier I mentioned.

ADD REPLYlink written 2.6 years ago by RamRS27k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 742 users visited in the last hour