Forum:Suggest common programming tasks for biologists who are novice programmers
4
4
Entering edit mode
3.6 years ago
katieirenec ▴ 40

Hi Biostars community,

tl;dr : What programming tasks do biologists who are novice programmers often need to learn / look for tutorials about? What goals are they frequently trying to achieve where they need programming?

I'm a recent PhD in computer science education. Specifically, I research and design new ways for novices to learn programming. I also was a CS-MCB double major in undergrad, with a focus on bioinformatics.

I have just started a postdoc where I am developing a purpose-first approach to learning programming. This approach supports learners as they learn common code patterns in a domain that they care about. Instead of starting with a language's syntax and semantics, which can be demotivating and a bit of a slog, learners start by assembling and tailoring common code patterns in my interface. I have evaluated this system for web scraping tasks, with positive results.

Now I am exploring new domains to implement this approach. In my experience, molecular biologists sometimes need to use a little code for particular tasks (e.g. analysis of results from a certain program, specific types of tree generation). However, it's been too many years since I did this myself, and I can't recall exactly which tasks --- plus, new GUIs may replace what used to be done with code.

What specific coding tasks do you think biologists often need?

education • 1.9k views
ADD COMMENT
1
Entering edit mode

There is a good website which contains a variety or programmin tasks for biologists. Check it out https://www.programmingforbiologists.org/exercises/

ADD REPLY
3
Entering edit mode
3.6 years ago
predeus ★ 2.0k

The most useful skill is text parsing, summarisation, and re-formatting. Knowing how to convert one text-readable format into another or summarise/extract values from text file is very, very useful.

ADD COMMENT
0
Entering edit mode

I absolutely agree with this. Parsing an analysis (especially dealing with data in spreadsheets) will be helpful in every niche. Also, some AI may be used to interpret values in an unusual format.

ADD REPLY
2
Entering edit mode
3.6 years ago

All the things that @predus and @Mensur said. In particular slicing and dicing datasets and plotting comparisons of the results (e.g. plot the difference in gene length between up and down regulated genes) and regex (extract the middle part of all these sample ids).

Plus:

  • Table joining and text pattern matching - e.g. converting between different types of gene id.
  • Manipulating gene models (get me the locations of all the first introns, or find me single exon genes, or find me transcripts that only differ by a retained intron).
  • Doing anything you'd do in a spreadsheet, but better - e.g. analyzing qPCR data.
ADD COMMENT
0
Entering edit mode

Manipulating gene models (get me the locations of all the first introns, or find me single exon genes, or find me transcripts that only differ by a retained intron).

What tools do you use for this? I know of R/bioconductor packages designed for this but is there something else?

ADD REPLY
1
Entering edit mode

The Ensembl perl API if working with reference genomes.

ADD REPLY
1
Entering edit mode

pysam has objects for manipulating GTF/GFF/Bed files.

ADD REPLY
1
Entering edit mode
3.6 years ago
Mensur Dlakic ★ 28k

What I see more and more lately is people writing only the glue part of various pipelines - see here for a random example of such a pipeline. These have a large number of external requirements for packages that are already available, and they only put them in proper order and ensure that the output of a previous program is compatible with the input of the next. In fact, one of the most popular prokaryotic genome annotation packages - prokka - is mostly a driver for a number of other software packages that formats their outputs into a neat final product.

ADD COMMENT
1
Entering edit mode
3.6 years ago
Mark ★ 1.6k

I have two suggestions.

The first: learn to manually install software and their dependancies. Conda and other package managers have made it a breeze to install software, however there are cavets and issues if you rely too much on package managers.

The second: Most bio programmers start of writing scripts. Learn how to convert that script into a CLI that takes inputs, flags and output directory. It will make the script more stable, make you think like a programmer and make your scripts more user friendly. Also self documentation!

ADD COMMENT
1
Entering edit mode

learn to manually install software and their dependancies

not sure I understand this one. These days you should be installing pretty much everything possible from conda into containers. You just have to learn to deal with the conda caveats. Its much easier and more reproducible than manual installation methods, for the same result.

ADD REPLY
0
Entering edit mode

I should have added: Not every piece of software you will want to use is installable via package managers.

ADD REPLY

Login before adding your answer.

Traffic: 1455 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6