Question

Where/How To Assess Which Bioinformatics Tools/Databases Are Most Used/Accessed?

8

Entering edit mode

12.5 years ago

aidan-budd 1.9k

Dear BioStar,

Do any of you know of a resource/document/article that attempts to assess how much a range of different bioinformatics tools/data resources/databases are "used"?

I could, of course, estimate using e.g. Web of Science to count the citations of a set of articles (e.g. those matching "bioinformatics tool" or "database" in the title; or all those in the NAR database and/or webserver issue). But I was wondering whether someone had already done something like this, perhaps using a differnet metric than article citations, which has several problems (for example, how to deal with resources described in more than one publication, and the fact that tools could be heavily used but the articles not cited, etc.)

I'm asking, as several times recently, while writing, it might have been useful to write something like "we provide for our users XXXXX, one of the most accessed/cited/etc. bioinformatics tools" with either a reference, or some indication about how this assessment of most-accessed was obtained.

I'm aware that there are several issues that make this a tricky question (what is "bioinformatics"? what constitutes "use" of a resource? etc.), but I thought/hoped that maybe someone in the BioStar community might have wanted something like this, in which case, it'd be great to hear where they got it from/how they solved this problem.

Thanks and best wishes,
Aidan

PS this is my first question to BioStar. Surprised to find that doing this makes me feel strangely nervous! :)

database • 6.5k views

ADD COMMENT • link updated 2.0 years ago by Ram 45k • written 12.5 years ago by aidan-budd 1.9k

1

Entering edit mode

Thanks for posting this query. I has a similar question given there are so many tools and web services in bioinformatics. Maybe one can list them in tools by topic in biostars and people can vote and also include its number of citations. Then one can access the usage. I recently came accross similar kind of poll in google docs for the most popular DFT Functional.

ADD REPLY • link 12.5 years ago by Pappu ★ 2.1k

1

Entering edit mode

perhaps an "ideal" solution would be to "somewhere" have a community-maintained list of bioinformatics tools/resources, curated by the creators of the resources, and linked by them to the appropriate publications whose citations should be linked to them; then use google scholar to provide the sum total of all the citations of these articles for each resource. But sounds like a lot of effort for addressing a question of only passing interest; setting something like that up in such a way that lots of people would get involved in it sounds like a lot of work and/or tricky/impossible...

ADD REPLY • link 12.5 years ago by aidan-budd 1.9k

score 4 · Answer 1 · 2013-01-07

4

Entering edit mode

12.5 years ago

William ★ 5.4k

For Mappers there is this overview from EBI:

http://wwwdev.ebi.ac.uk/fg/hts_mappers/

Something like this also could be done (or might already be done?) for tools like genome brosers, SNP / CNV /SV callers.

The citiation metric has it's problems but at least is a usable metric. I would welcome other / better metrics.

ADD COMMENT • link 12.5 years ago by William ★ 5.4k

0

Entering edit mode

Thanks for that, William. Have had a look at the link and, indeed, this is just the kind of thing I was looking for/thinking of. Might contact some friends at the EBI to see if they're aware of any more of this kind of thing out there. Thanks for taking the time to post your answer.

ADD REPLY • link 12.5 years ago by aidan-budd 1.9k

score 3 · Answer 2 · 2013-01-07

3

Entering edit mode

12.5 years ago

cdsouthan ★ 1.9k

While it is a historical snapshot (2009) you could find this ELIXIR Bioinformatics User Survey Report (by Sandrine Palcy and Antoine de Daruvar) of interest http://www.elixir-europe.org/prep/sites/elixir-europe.org.prep/files/documents/reports/elixir_usersurvey_finalreport.pdf

The parallel ELIXIR Database Provider Survey also provides usage assessment as reported by the sources (Graham Cameron and Chris Southan) http://www.elixir-europe.org/prep/sites/elixir-europe.org.prep/files/documents/reports/elixir_strategy_for_data_resources_report.pdf

A detailed appendix to this report can be found at http://figshare.com/articles/ElIXIR_Database_Provider_Survey_Report_Appendix_/106310

ADD COMMENT • link 12.5 years ago by cdsouthan ★ 1.9k

0

Entering edit mode

Thanks a lot for those links, Christopher. I'd forgotten about those surveys, and a quick skim through them suggests there may well be the kind of thing I'm looking for there.

ADD REPLY • link 12.5 years ago by aidan-budd 1.9k

0

Entering edit mode

Indeed, might also be relevant for the other question I asked today, about the demographics of bioinformaticians... Are Lots Of Bioinformaticians Embedded In Wet Labs? Any Ideas On Sources Of Demographic Data On Bioinformaticians?

ADD REPLY • link 12.5 years ago by aidan-budd 1.9k

1

Entering edit mode

Adrian, Its good that those efforts get re-accessed. As you would guess these big themes (real impact vs citations vs user stats, what comparative access stats really mean, database redundancy, funding persistence, ROI, users cognizance of the resource landscape, data QC etc etc ) can be much chewed over - but not here I guess

ADD REPLY • link 12.5 years ago by cdsouthan ★ 1.9k

score 3 · Answer 3 · 2013-01-07

3

Entering edit mode

12.5 years ago

Sean Davis 27k

Bioconductor maintains download stats for each package:

http://bioconductor.org/packages/stats/bioc/Biobase.html

Paying attention to the distinct IP address column is probably the safest way to go. These download stats ignore the bioconductor mirrors.

ADD COMMENT • link 12.5 years ago by Sean Davis 27k

0

Entering edit mode

Most academic and other large institutions have a small external IP range - going by unique IP only will probably exclude more true downloads than duplicates (though I can't immediately think of a way of quantifying whether that's true).

ADD REPLY • link 12.5 years ago by Richard Smith-Unna ▴ 140

score 2 · Answer 4 · 2013-01-07

2

Entering edit mode

12.5 years ago

Pierre Lindenbaum 166k

An idea: you could search for the project tagged 'bioinformatics' on github/SF/google (e.g.: https://code.google.com/hosting/search?q=label%3abioinformatics ) and then get the number of downloads for each project: (e.g: https://code.google.com/p/psicquic/downloads/list?can=2&q=&sort=&groupby=&colspec=Filename+Summary+DownloadCount ). Of course, it will not give you the number of "checkout' of each project.

ADD COMMENT • link 12.5 years ago by Pierre Lindenbaum 166k

2

Entering edit mode

I've implemented my idea here: http://figshare.com/articles/Number_of_downloads_for_each_project_tagged_with_Bioinformatics_on_code_google_com/106321

ADD REPLY • link 12.5 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Pierre, that's superb! Thanks so much for taking the time to do that. Great to have a way of looking at this that doesn't depend on citations. Very much appreciated.

ADD REPLY • link 12.5 years ago by aidan-budd 1.9k

0

Entering edit mode

Thanks a lot for that suggestion, Pierre; I like a lot the idea of using a non-citation based metric like this. Curious (but not curious enough to go out and analyse it... :) ) about how well different kinds of metric would correlate with each other.

ADD REPLY • link 12.5 years ago by aidan-budd 1.9k