Question: Bioinformatics word cloud to use in classes
3
gravatar for gramarga
6 months ago by
gramarga50
gramarga50 wrote:

Hi everyone,

Whenever I teach an introductory class about Bioinformatics, I like to use this word cloud - I feel it gives a quick glimpse of the field. However, it is outdated (from 2011).

So I want to create an up-to-date figure. I understand I can manually copy the tag counts from the tag content and use only the first few pages, because the word counts quickly drop below 15 or so.

However, ideally I would like to have the data for each year separately, to show how topics change over time. Unfortunately, I am a new user (first post here) and do not have the privileges to download the database as shown in the blog post above.

Would anyone with access be able / willing to fetch and share these data?

Thank you very much!

ADD COMMENTlink modified 6 months ago by Istvan Albert ♦♦ 81k • written 6 months ago by gramarga50
2

Nice application! Tags are very often used incorrectly though. An alternative approach could be to use the title/abstract of recent (bioinformatics) papers? Although filtering those terms obviously requires some more work to get the bioinformatics terminology out.

ADD REPLYlink written 6 months ago by WouterDeCoster40k

Well, using paper title/abstracts would certainly be useful, although in a different way.

For my teaching purposes, I actually want to use community-based information, in the sense that it reflects user needs. I expect my students to face many of the questions that other users experience. For instance, note that the tag software error has been used 1838 times - I would then address the value of resources such as Biostars. It also shows trends in language use (e.g., noticeable drop in perl, increasing importance of python and especially R).

Some noise due to incorrect tag usage should not be a big deal. In any case, I will filter for only the most used ones, so the signal will still be there.

Thanks for the input!

ADD REPLYlink modified 6 months ago • written 6 months ago by gramarga50

Gha, funny that you pick the example of software error. Because it is used so frequently it will become the first suggestion users get when making a new post, leading it to be used more and more. Often it's actually a user error :)

ADD REPLYlink written 6 months ago by WouterDeCoster40k

Even better!

Nice to know this, because in classes students show similar behavior. They often jump the gun and call anything a software error, when it is usually just a typo. This will make a good example!

ADD REPLYlink written 6 months ago by gramarga50
1

Here is a list of tags with counts:

http://data.biostarhandbook.com/data/biostar-tags.txt

ADD REPLYlink written 6 months ago by Istvan Albert ♦♦ 81k

cc Istvan Albert , Devon Ryan and Pierre Lindenbaum

ADD REPLYlink written 6 months ago by Joe14k
6
gravatar for Pierre Lindenbaum
6 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum122k wrote:

instead of using biostar, use pubmed an the mesh terms. Here is an example using my tools http://lindenb.github.io/jvarkit/PubmedDump.html and http://lindenb.github.io/jvarkit/XsltStream.html

ADD COMMENTlink written 6 months ago by Pierre Lindenbaum122k

jiz this is very anthropocentric .. ;)

ADD REPLYlink written 6 months ago by Nicolas Rosewick8.1k

Pierre, thanks for pointing this out!

Please, see my reply above about wanting to reflect community questions.

ADD REPLYlink written 6 months ago by gramarga50

The blog post What is bioinformatics about? also has a recipe for creating word clouds from abstracts. At this moment, the blog is down to me, though.

ADD REPLYlink written 6 months ago by h.mon27k

What tool creates the cloud itself? It has cool looking styling, what parameters does it need to make it look like that? Now that I have played a bit with word clouds I think that figuring out the right styling is a separate challenge onto its own.

ADD REPLYlink written 6 months ago by Istvan Albert ♦♦ 81k

after googling: https://wordart.com/

ADD REPLYlink written 6 months ago by Pierre Lindenbaum122k

Love the standalone "High".

ADD REPLYlink written 6 months ago by cschu1811.8k
6
gravatar for Istvan Albert
6 months ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

Here is an image I made from the data at

http://data.biostarhandbook.com/data/biostar-tags.txt

enter image description here

ADD COMMENTlink written 6 months ago by Istvan Albert ♦♦ 81k
2

I used these data and made a figure to look like the one from the original post. RNA-seq pretty much overwhelms everything else. Word cloud in the original blog style

Same data with different scaling. Different scale and style

ADD REPLYlink modified 6 months ago • written 6 months ago by gramarga50

Perhaps consider working your process up in a github repo or blog post? Others might like to reuse this approach in future :)

ADD REPLYlink written 6 months ago by Joe14k

Sure! Would you mind pointing me to a GitHub example with good practices?

ADD REPLYlink written 6 months ago by gramarga50

looks like R is missing, second most common tag, probably because it is one letter long.

ADD REPLYlink written 6 months ago by Istvan Albert ♦♦ 81k

Another word cloud, this time based on the words in the 1000 most highly voted post titles:

and using the https://wordart.com service.

enter image description here

ADD REPLYlink modified 6 months ago • written 6 months ago by Istvan Albert ♦♦ 81k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1519 users visited in the last hour