Question

deep search for software citations

0

Entering edit mode

9.5 years ago

stnava • 0

I've been trying to track (for some time) how ants (advanced normalization tools) and ITK (the insight toolkit, http://itk.org/, https://github.com/InsightSoftwareConsortium/ITK) are cited and/or used in publications. It turns out that it's fairly tricky to do ... citations may be to different "source" academic papers, to different websites (sourceforge, picsl, github, nitrc, neurodebian) or just by citing the name of the software. Another issue is that other software is built on itk and ants so one might need to mine for these dependencies as well or even for software that clones from our github repos. I started (some time ago)

http://scholar.google.com/citations?user=ox-mhOkAAAAJ&hl=en

which is useful but not ideal due to the lack of specificity/controllability.

Do you guys know of any resources that deal with this type of problem?

software citation search • 2.5k views

ADD COMMENT • link updated 3.1 years ago by Ram 43k • written 9.5 years ago by stnava • 0

0

Entering edit mode

If I developed a serious tool, I would publish it as a preprint or better as a journal paper. Then you will have a stable source for others to cite and to be shown in google scholar. I frequently write a preprint/paper just as a documentation, forcing myself to clarify messy details which I would overlook otherwise.

ADD REPLY • link updated 3.1 years ago by Ram 43k • written 9.5 years ago by lh3 33k

Ram · Answer 1 · 2014-10-27

0

Entering edit mode

9.5 years ago

satrajit.ghosh • 0

One effort towards this is the Resource Identification Initiative. However as far as I know versioning has not been completely dealt with.

https://www.force11.org/Resource_identification_initiative

In addition to the above effort there are a couple of other options:

To include the complete provenance, which would include software provenance.
To use a versioned URI/IRI:
Use linked data platform best practices:

http://www.w3.org/TR/ldp-primer/

https://dvcs.w3.org/hg/ldpwg/raw-file/default/ldp-bp/ldp-bp.html

However, any of the above solutions may require some commitment to persistence by some entity.

ADD COMMENT • link updated 3.1 years ago by Ram 43k • written 9.5 years ago by satrajit.ghosh • 0

0

Entering edit mode

another possibility that currently exists for software at least on repositories like github is the ability to create a doi for a specific release - however most users don't know how to get at a specific version of the code:

https://guides.github.com/activities/citable-code/

ADD REPLY • link updated 3.1 years ago by Ram 43k • written 9.5 years ago by satrajit.ghosh • 0

Ram · Answer 2 · 2014-10-27

Asking people to add full provenance is great if you are an omnipotent force like NIH (which by the way still has a lot of trouble getting people to put their grant numbers into papers in a consistent way), but for the rest of us mortals all we can really dream for is an identifier, if we make it super easy for people to do it and apply pressure at just the right time (during publication).

That is my two cents, for what it's worth, but I am biased in that I have already had 174 people (confirmed) add identifiers to their papers.

http://scholar.google.com/scholar?scisbd=2&q=RRID&hl=en&as_sdt=1,5&as_vis=1

By the way, my other two cents:

ANTS is a really crappy name for a project for obvious search for ants and see what you get back reasons. ImageJ does not appear randomly in papers, a good thing. Python, remarkably enough, follows a bimodal distribution of journal titles. ;-)
ANTS isRRID:nlx_75959 try your new lovely identifier in google scholar - I get back two papers that used ANTS (methods section attribution)
We have created a pipeline with some fun tools that help to answer this question and have processed the OA literature (so far). ANTZ does come back in a few papers (this is from URL mentions in the methods only, we have a tool that will give you the option of curating mentions from names in the next 1-2 months; we ran a set of learning algorithms on this against the info we get back from publisher apis for modeldb and were able to go from a precision of 40% to 98% so once this is public....I would love for you to test it):
Available now (see column called mentioned in literature): https://www.neuinfo.org/mynif/search.php?t=indexable&nif=nlx_144509-1&q=%22ANTS+-+Advanced+Normalization+ToolS%22&filter=
Adding all alternate urls, synonyms, abreviations and other info to the catalog representation will allow our crawlers to find mentions of your tool more effectively. We do some things that are smart, but algorithmic approaches are never going to be as good as someone interested in the answer.

Ram · Answer 3 · 2014-10-28

For RRIDs, I personally would like to use a simple versioning strategy not unlike a genbank identifier, where you can append the version following the numeric ID. In such a way, you can query across all versions with the root number, but still reference the specific version for reproducibility purposes.

RRIDs could still be resolved with a URI or a DOI, and could be the subject of a set of triples that can aid discovery and attribution. See the NIH Software Discovery Index report for requirements analysis - we are looking for comments there:

http://softwarediscoveryindex.org/report/

For example, data types operated upon, etc.