Question: Where/How To Assess Which Bioinformatics Tools/Databases Are Most Used/Accessed?
8
gravatar for aidan-budd
6.1 years ago by
aidan-budd1.9k
Germany
aidan-budd1.9k wrote:

Dear BioStar,

Do any of you know of a resource/document/article that attempts to assess how much a range of different bioinformatics tools/data resources/databases are "used"?

I could, of course, estimate using e.g. Web of Science to count the citations of a set of articles (e.g. those matching "bioinformatics tool" or "database" in the title; or all those in the NAR database and/or webserver issue). But I was wondering whether someone had already done something like this, perhaps using a differnet metric than article citations, which has several problems (for example, how to deal with resources described in more than one publication, and the fact that tools could be heavily used but the articles not cited, etc.)

I'm asking, as several times recently, while writing, it might have been useful to write something like "we provide for our users XXXXX, one of the most accessed/cited/etc. bioinformatics tools" with either a reference, or some indication about how this assessment of most-accessed was obtained.

I'm aware that there are several issues that make this a tricky question (what is "bioinformatics"? what constitutes "use" of a resource? etc.), but I thought/hoped that maybe someone in the BioStar community might have wanted something like this, in which case, it'd be great to hear where they got it from/how they solved this problem.

Thanks and best wishes,

Aidan

PS this is my first question to BioStar. Surprised to find that doing this makes me feel strangely nervous! :)

database tool • 3.6k views
ADD COMMENTlink modified 6.1 years ago by Sean Davis25k • written 6.1 years ago by aidan-budd1.9k
1

Thanks for posting this query. I has a similar question given there are so many tools and web services in bioinformatics. Maybe one can list them in tools by topic in biostars and people can vote and also include its number of citations. Then one can access the usage. I recently came accross similar kind of poll in google docs for the most popular DFT Functional.

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by Pappu1.9k
1

perhaps an "ideal" solution would be to "somewhere" have a community-maintained list of bioinformatics tools/resources, curated by the creators of the resources, and linked by them to the appropriate publications whose citations should be linked to them; then use google scholar to provide the sum total of all the citations of these articles for each resource. But sounds like a lot of effort for addressing a question of only passing interest; setting something like that up in such a way that lots of people would get involved in it sounds like a lot of work and/or tricky/impossible...

ADD REPLYlink written 6.1 years ago by aidan-budd1.9k
4
gravatar for William
6.1 years ago by
William4.4k
Europe
William4.4k wrote:

For Mappers there is this overview from EBI:

http://wwwdev.ebi.ac.uk/fg/hts_mappers/

Something like this also could be done (or might already be done?) for tools like genome brosers, SNP / CNV /SV callers.

The citiation metric has it's problems but at least is a usable metric. I would welcome other / better metrics.

ADD COMMENTlink written 6.1 years ago by William4.4k

Thanks for that, William. Have had a look at the link and, indeed, this is just the kind of thing I was looking for/thinking of. Might contact some friends at the EBI to see if they're aware of any more of this kind of thing out there. Thanks for taking the time to post your answer.

ADD REPLYlink written 6.1 years ago by aidan-budd1.9k
3
gravatar for cdsouthan
6.1 years ago by
cdsouthan1.8k
cdsouthan1.8k wrote:

While it is a historical snapshot (2009) you could find this ELIXIR Bioinformatics User Survey Report (by Sandrine Palcy and Antoine de Daruvar) of interest http://www.elixir-europe.org/prep/sites/elixir-europe.org.prep/files/documents/reports/elixir_usersurvey_finalreport.pdf

The parallel ELIXIR Database Provider Survey also provides usage assessment as reported by the sources (Graham Cameron and Chris Southan) http://www.elixir-europe.org/prep/sites/elixir-europe.org.prep/files/documents/reports/elixir_strategy_for_data_resources_report.pdf

A detailed appendix to this report can be found at http://figshare.com/articles/ElIXIR_Database_Provider_Survey_Report_Appendix_/106310

ADD COMMENTlink modified 6.1 years ago • written 6.1 years ago by cdsouthan1.8k

Thanks a lot for those links, Christopher. I'd forgotten about those surveys, and a quick skim through them suggests there may well be the kind of thing I'm looking for there.

ADD REPLYlink written 6.1 years ago by aidan-budd1.9k

Indeed, might also be relevant for the other question I asked today, about the demographics of bioinformaticians... Are Lots Of Bioinformaticians Embedded In Wet Labs? Any Ideas On Sources Of Demographic Data On Bioinformaticians?

ADD REPLYlink written 6.1 years ago by aidan-budd1.9k
1

Adrian, Its good that those efforts get re-accessed. As you would guess these big themes (real impact vs citations vs user stats, what comparative access stats really mean, database redundancy, funding persistence, ROI, users cognizance of the resource landscape, data QC etc etc ) can be much chewed over - but not here I guess

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by cdsouthan1.8k
3
gravatar for Sean Davis
6.1 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

Bioconductor maintains download stats for each package:

http://bioconductor.org/packages/stats/bioc/Biobase.html

Paying attention to the distinct IP address column is probably the safest way to go. These download stats ignore the bioconductor mirrors.

ADD COMMENTlink written 6.1 years ago by Sean Davis25k

Most academic and other large institutions have a small external IP range - going by unique IP only will probably exclude more true downloads than duplicates (though I can't immediately think of a way of quantifying whether that's true).

ADD REPLYlink written 6.1 years ago by Richard Smith-Unna130
2
gravatar for Pierre Lindenbaum
6.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:

An idea: you could search for the project tagged 'bioinformatics' on github/SF/google (e.g.: https://code.google.com/hosting/search?q=label%3abioinformatics ) and then get the number of downloads for each project: (e.g: https://code.google.com/p/psicquic/downloads/list?can=2&q=&sort=&groupby=&colspec=Filename+Summary+DownloadCount ). Of course, it will not give you the number of "checkout' of each project.

ADD COMMENTlink written 6.1 years ago by Pierre Lindenbaum116k
2

I've implemented my idea here: http://figshare.com/articles/Number_of_downloads_for_each_project_tagged_with_Bioinformatics_on_code_google_com/106321

ADD REPLYlink written 6.1 years ago by Pierre Lindenbaum116k

Pierre, that's superb! Thanks so much for taking the time to do that. Great to have a way of looking at this that doesn't depend on citations. Very much appreciated.

ADD REPLYlink written 6.1 years ago by aidan-budd1.9k

Thanks a lot for that suggestion, Pierre; I like a lot the idea of using a non-citation based metric like this. Curious (but not curious enough to go out and analyse it... :) ) about how well different kinds of metric would correlate with each other.

ADD REPLYlink written 6.1 years ago by aidan-budd1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1608 users visited in the last hour