Question

Is it necessary to make the developed python packages installable or keeping a jupyter notebook is enough?

0

Entering edit mode

3.1 years ago

saqlain ▴ 90

Hi,

I have developed a few python packages/methods for handling genomics data, e.g., gene cell filtering, normalization, dimensionality reduction etc. I am aware that it is an excellent practice to make it installable through "pip install". I wondered if it is necessary to make packages installable or jupyter notebook with a sound readme file will be enough and it is acceptable to do so in the scientific world? Thanks in anticipation.

genomics Analysis Python packages Data • 1.4k views

ADD COMMENT • link updated 3.1 years ago by Mensur Dlakic ★ 28k • written 3.1 years ago by saqlain ▴ 90

1

Entering edit mode

Accessibility/usability are top considerations when people look for software to help them analyze data (I am referring to non-programmers/bench scientists). Anything you can do to facilitate that will make your software more accessible. If you have developed something that is currently not available then go the extra step to make your code more accessible.

ADD REPLY • link 3.1 years ago by GenoMax 147k

score 3 · Accepted Answer · 2021-10-06

I have developed a few python packages/methods for handling genomics data, e.g., gene cell filtering, normalization, dimensionality reduction etc.

It depends what exactly you mean by this statement. Did you truly develop new methods and write packages for all these tasks? That seems unlikely given the number of tasks, but maybe that's what it is. In such a case I suggest you create a whole package and make it as easily installable as possible.

Or did you write scripts and/or develop pipelines that utilize the existing packages to perform the tasks you listed? If this is the case, a notebook and/or a plain python script could be enough.

Either way, a good documentation is essential for others to adopt your methodology, especially so in competitive fields with lots of application that are already available.

I often tell my students a story about SAM and HMMer, which back in 90s and early this century were pretty much the only two packages for biological applications of hidden Markov models. In my opinion SAM was a better package, but it was closed source, not updated often, and not well documented. HMMer was the opposite in all regards, and today it is the gold standard for all HMM applications, while most people have not even heard about SAM. Not saying that making your software open-sourced and well-documented will guarantee immortality (there are reasons beyond those why HMMer succeeded and SAM didn't), but it will make a difference in how your scientific contribution is measured in the end.