Question: Bioinformatics word cloud to use in classes
3
gravatar for gramarga
4 weeks ago by
gramarga50
gramarga50 wrote:

Hi everyone,

Whenever I teach an introductory class about Bioinformatics, I like to use this word cloud - I feel it gives a quick glimpse of the field. However, it is outdated (from 2011).

So I want to create an up-to-date figure. I understand I can manually copy the tag counts from the tag content and use only the first few pages, because the word counts quickly drop below 15 or so.

However, ideally I would like to have the data for each year separately, to show how topics change over time. Unfortunately, I am a new user (first post here) and do not have the privileges to download the database as shown in the blog post above.

Would anyone with access be able / willing to fetch and share these data?

Thank you very much!

ADD COMMENTlink modified 4 weeks ago by Istvan Albert ♦♦ 79k • written 4 weeks ago by gramarga50
2

Nice application! Tags are very often used incorrectly though. An alternative approach could be to use the title/abstract of recent (bioinformatics) papers? Although filtering those terms obviously requires some more work to get the bioinformatics terminology out.

ADD REPLYlink written 4 weeks ago by WouterDeCoster37k

Well, using paper title/abstracts would certainly be useful, although in a different way.

For my teaching purposes, I actually want to use community-based information, in the sense that it reflects user needs. I expect my students to face many of the questions that other users experience. For instance, note that the tag software error has been used 1838 times - I would then address the value of resources such as Biostars. It also shows trends in language use (e.g., noticeable drop in perl, increasing importance of python and especially R).

Some noise due to incorrect tag usage should not be a big deal. In any case, I will filter for only the most used ones, so the signal will still be there.

Thanks for the input!

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by gramarga50

Gha, funny that you pick the example of software error. Because it is used so frequently it will become the first suggestion users get when making a new post, leading it to be used more and more. Often it's actually a user error :)

ADD REPLYlink written 4 weeks ago by WouterDeCoster37k

Even better!

Nice to know this, because in classes students show similar behavior. They often jump the gun and call anything a software error, when it is usually just a typo. This will make a good example!

ADD REPLYlink written 4 weeks ago by gramarga50
1

Here is a list of tags with counts:

http://data.biostarhandbook.com/data/biostar-tags.txt

ADD REPLYlink written 4 weeks ago by Istvan Albert ♦♦ 79k

cc Istvan Albert , Devon Ryan and Pierre Lindenbaum

ADD REPLYlink written 4 weeks ago by jrj.healey11k
6
gravatar for Pierre Lindenbaum
4 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

instead of using biostar, use pubmed an the mesh terms. Here is an example using my tools http://lindenb.github.io/jvarkit/PubmedDump.html and http://lindenb.github.io/jvarkit/XsltStream.html

ADD COMMENTlink written 4 weeks ago by Pierre Lindenbaum118k

jiz this is very anthropocentric .. ;)

ADD REPLYlink written 4 weeks ago by Nicolas Rosewick7.4k

Pierre, thanks for pointing this out!

Please, see my reply above about wanting to reflect community questions.

ADD REPLYlink written 4 weeks ago by gramarga50

The blog post What is bioinformatics about? also has a recipe for creating word clouds from abstracts. At this moment, the blog is down to me, though.

ADD REPLYlink written 4 weeks ago by h.mon24k

What tool creates the cloud itself? It has cool looking styling, what parameters does it need to make it look like that? Now that I have played a bit with word clouds I think that figuring out the right styling is a separate challenge onto its own.

ADD REPLYlink written 4 weeks ago by Istvan Albert ♦♦ 79k

after googling: https://wordart.com/

ADD REPLYlink written 4 weeks ago by Pierre Lindenbaum118k

Love the standalone "High".

ADD REPLYlink written 4 weeks ago by cschu1811.5k
6
gravatar for Istvan Albert
4 weeks ago by
Istvan Albert ♦♦ 79k
University Park, USA
Istvan Albert ♦♦ 79k wrote:

Here is an image I made from the data at

http://data.biostarhandbook.com/data/biostar-tags.txt

enter image description here

ADD COMMENTlink written 4 weeks ago by Istvan Albert ♦♦ 79k
2

I used these data and made a figure to look like the one from the original post. RNA-seq pretty much overwhelms everything else. Word cloud in the original blog style

Same data with different scaling. Different scale and style

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by gramarga50

Perhaps consider working your process up in a github repo or blog post? Others might like to reuse this approach in future :)

ADD REPLYlink written 4 weeks ago by jrj.healey11k

Sure! Would you mind pointing me to a GitHub example with good practices?

ADD REPLYlink written 4 weeks ago by gramarga50

looks like R is missing, second most common tag, probably because it is one letter long.

ADD REPLYlink written 4 weeks ago by Istvan Albert ♦♦ 79k

Another word cloud, this time based on the words in the 1000 most highly voted post titles:

and using the https://wordart.com service.

enter image description here

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Istvan Albert ♦♦ 79k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1206 users visited in the last hour