Blog: Evolution of Biostars
24
gravatar for venu
9 months ago by
venu6.3k
Germany
venu6.3k wrote:

Hi,

I have been planning to do this small data science project & finally got sometime to actually do it.

I scraped Biostars data (Question title, day asked & associated tags) since the site is live until today (2009 - 23/02/2019) and asked some questions. Here is the brief overview

1. Which programming language is most frequently used in Bioinformatics

Depending on the tags used with the questions. I looked at R, Python & perl. The result is obvious and the major contribution goes to Bioconductor project.


enter image description here

2. Number of question per year

This should, in a way, reflect number of new researchers getting into bioinformatics / bioinformatics becoming an essential component of life science research.

enter link description here

3. Frequency of tags

This is already discussed here, a few days back (I'm just posting it as the data is very new but the results are mostly similar from that thread)

enter image description here

I will write a blog post with some additional analysis & share the code :)

P.S: Mods, if this thread doesn't fit into category Blog, feel free to change / suggest appropriate category.

blog biostars • 1.5k views
ADD COMMENTlink modified 9 months ago by JC9.1k • written 9 months ago by venu6.3k
1

Nice! I feel like the first graph might be also interesting to see relatively, by taking a look if the fraction of programming languages changes over the years.

ADD REPLYlink written 9 months ago by WouterDeCoster42k

Thanks Wouter. Do you mean, fraction of change compared to previous year or something else? This is a type of analysis I thought of but didn't do yet.

ADD REPLYlink written 9 months ago by venu6.3k
2

I'm not sure what I mean, but somehow you should take into account the popularity of biostars. You could state that perl always has about the same absolute number of questions, although relatively it lost massively.

ADD REPLYlink written 9 months ago by WouterDeCoster42k
1

Nice venu.

It will be good to see the distribution of Tools flag with years or with applications

ADD REPLYlink modified 9 months ago • written 9 months ago by lakhujanivijay4.6k

Thanks Vijay. It may makes sense to get an overview of Tools per year, to check how frequently new tools are being developed. But what do you mean by with application? Because, each tool is flagged with more than one tag, so it's extremely complicated to get one-word application to each tool (by programmatically).

ADD REPLYlink written 9 months ago by venu6.3k

by "application" I would mean, RNA-seq, WGS , whole transcriptome etc. But , I agree that that will be chaotic as you mentioned.

ADD REPLYlink written 9 months ago by lakhujanivijay4.6k
1

Strange how the number of questions became 'saturated' from 2016. What could be the result of that? The field and everything that it encompasses had already matured?

ADD REPLYlink written 9 months ago by Kevin Blighe52k
3

I'd guess, many basic questions are being asked over the previous years and answered very well, so with a simple google search, first hits are landing on biostars threads. Also many tool developers are documenting their tools with clean examples and responding to the user queries. Might be saturated in that sense but the applications of the field are wide spread and growing?

This progress might be one of the reasons but it's just my opinion.

ADD REPLYlink written 9 months ago by venu6.3k
2

It would likely require more in-depth analysis but the number of truly unique topics/questions will show an opposite trend to plot in #2. As @venu said, a large majority of questions likely have some pointers/answer(s) that already exist on Biostars or elsewhere.

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax75k

Nice! This is "Evolution of Biostars" rather than history :-)

If you have the data parsed out, can you perhaps make animated/interactive gifs for the word cloud that walk through the top 100/50 terms for each year?

ADD REPLYlink written 9 months ago by genomax75k

Aw, your title makes more sense.

Yes, I will try to make per-year frequency of tags (a good idea to see troubled topics per year :p).

ADD REPLYlink written 9 months ago by venu6.3k

Python has certainly gained popularity over PERL but R dominates the tool ecosystem pyramid!

ADD REPLYlink written 9 months ago by lakhujanivijay4.6k
2

Not necessarily: we can conclude that people are most puzzled about R ;-)

ADD REPLYlink written 9 months ago by WouterDeCoster42k

Yeah. HaHa HaHa HaHa ;)

ADD REPLYlink written 9 months ago by lakhujanivijay4.6k
1

Since the most frequent tag is RNA-Seq and and the programming language is R, my guess is that a lot of people are confused with how to run DESeq/EdgeR :)

ADD REPLYlink modified 9 months ago • written 9 months ago by grant.hovhannisyan1.8k
3
gravatar for Istvan Albert
9 months ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

Here are some traffic data over the last five years.

  • 11 million users
  • 57 million pagevies

PS. total number of posts (including commens/answers) per year would also be an interesting plot to make.

enter image description here

ADD COMMENTlink modified 9 months ago • written 9 months ago by Istvan Albert ♦♦ 81k

11 million...? That is a lot! Population of Republic of Ireland is ~4 million.

ADD REPLYlink modified 9 months ago • written 9 months ago by Kevin Blighe52k

Turns out if the traffic were a "country" we'd be the 83rd highest populated country right between Greece and Bolivia.

ADD REPLYlink written 9 months ago by Istvan Albert ♦♦ 81k

Definitely the most popular general bioinformatics website on Earth!

ADD REPLYlink written 9 months ago by Kevin Blighe52k
2
gravatar for MutationalMeltdown
9 months ago by
MutationalMeltdown30 wrote:

The data is interesting but does it answer the question "Which programming language is most frequently used in Bioinformatics"? Other possible factors:

  • R might be popular on this forum, other languages on other forums- for example, there are 628 co-tagged R and bioinformatics questions on Stack Overflow compared to 822 co-tagged python and bioinformatics questions
  • Users may not tag the language the tool is written in- for example, bowtie2 is written mostly in C++ but it seems people don't use the language tag when asking a question
  • Method developers may not ask in the context of bioinformatics- Developers for bioinformatics might not use Biostars or even tag bioinformatics in their questions, they may phrase their questions to be about the algorithmic/ programming problem and ask on Stack Overflow or elsewhere
  • People using R might need more help- R is a language that is perhaps accessible to people coming from a non-programming background, so perhaps people ask more questions. To quote Mick Watson from twitter: "There are no [Stack Overflow] questions on Perl because every Perl programmer is 50+ and knows what they're doing"
ADD COMMENTlink modified 9 months ago • written 9 months ago by MutationalMeltdown30

Devil is in the details, it seems. If only the peer review process and university ranking systems teased out the respective biases as you have done here.

ADD REPLYlink written 9 months ago by Kevin Blighe52k
2
gravatar for JC
9 months ago by
JC9.1k
Mexico
JC9.1k wrote:

Perl enthusiast here.

I know R and Python, but I'm always more productive with Perl, in general for Python and R I need to Google (stack overflow, biostar, reddit, ...) how to do some things, but on Perl I rarely look for help.

Perl is more natural for me, also because text processing is the main task which I generally I need a script, Perl is the best.

ADD COMMENTlink written 9 months ago by JC9.1k

Enjoy Perl too. Already follow you in github and hope to share some perl script for bioinformatics pipeline @JC

ADD REPLYlink written 9 months ago by Shicheng Guo7.8k
1
gravatar for Istvan Albert
9 months ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

Made a chart with total posts (question+answer+comment) for each year

enter image description here

ADD COMMENTlink written 9 months ago by Istvan Albert ♦♦ 81k

actually, the title should be New Posts per year

ADD REPLYlink written 9 months ago by Istvan Albert ♦♦ 81k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1767 users visited in the last hour