Forum:What's more clear, loops or functions?
4
1
Entering edit mode
3.1 years ago
DriesB ▴ 90

Hello all,

I wanted to discuss a question with you that I've been asking myself during a project. In this project, I had to read.table() a great number of files from a large number of groups, which were provided to us by multiple research groups.

As the data was provided in multiple formats that differed significantly, I wrote multiple 'reading scripts'. This allowed me to experiment with both scripts based upon looping and scripts based upon functions.

Loop-based scripts consisted mostly of one big loop, which performed many alterations for one measurement at a time. Function-based scripts took entire lists or arrays of measurements and applied functions to these using the apply family of functions. This meant that a small alteration was performed over a large amount of measurements at a time.

Now, comparing these two types of scripts, I feel like the loop-based ones are more intuitive to read than the function-based ones. You see what happens to a measurement, step by step. This in contrast to the function-based scripts, which are repeatedly broken up by apply. This might be relevant for sharing the scripts with colleagues that are not bioinformaticians.

I would like to note that I know that functions and the apply family are supposed to run faster than loops, but this is not as important for me as readability. One thing which could be a perk of using functions is the possibility of debugging in Rstudio.

I would like to hear your views!

R coding style Forum • 758 views
0
Entering edit mode

Could you clarify, what is meant by "loops" and "functions" ?

0
Entering edit mode

Imperative R is gross! :)

0
Entering edit mode

If the speed penalty isn't an issue, take readability (if its that high a priority).

If the speed is necessary, document/comment your code more thoroughly.

2
Entering edit mode
3.1 years ago
Ram 35k

Code by itself is seldom clear beyond a point. As long as comprehensive code comments are added while the code is written, things should be fine.

I always write the comment first, code after. I then revisit the comment after the code is done and ensure it is relevant. The comment should be a level high enough so readers don't need to understand too much of the specifics, but specific enough so salient points are highlighted.

Also, a top level comment explaining the workflow offering a quick summary helps.

I would prefer apply, with a code comment on what the apply exactly does - shouldn't be too challenging.

P.S.: This is not strictly bioinformatics, but it's close enough to pass the filter, I think.

2
Entering edit mode
3.1 years ago
zx8754 10k

I prefer *apply 99% of the times. One of the reasons, they keep environment clean from intermediary objects.

For example, importing data using data.table, this is so basic, I don't think below line even needs/deserves commenting:

myData <- rbindlist(lapply(list.files(), fread)

0
Entering edit mode

Yes, that's true! But readability still has highest priority for me.

2
Entering edit mode
3.1 years ago

My personal view is that using apply type of functions is actually the cleaner way and is also easier to read. For instance, there is no need for declaring any extra 'dummy' variables that get filled during the looping. A simple way of reading tables and assembling those into a single data.frame would be

tables <- lapply(list.of.file.names, read.delim, skip = 5)
tables <- Reduce(function(x, y) {rbind(x, y)}, tables)


I believe it cannot get much more readable than this.

EDIT: and RamRS' point is of course true too; proper annotation would anyway help people that read your code to understand what is going on.

2
Entering edit mode
3.1 years ago

The problem is best formulated in terms of programming paradigms as a choice between

• imperative (procedural) programming that uses constructs such as if, for, while versus
• functional (declarative) programming that has functions such as map, filter, reduce (using Python as language)

By and large for beginners, it is much simpler to think one step at a time and work the problem out that way. To make a new list with numbers larger than 5 an imperative solutions would be:

y = []
for x in range(10):
if x > 5:
y.append(x)


Contrast this to the "elegance" and "simplicity" of a functional construct

def choose(x):
return x > 5

y = filter(choose, range(10))


Note just how many fewer "moving" parts are there. As the program grows this pays off many times over. I put "simplicity" in quotes as it is a conditional simplicity - if you understand what filter does and how it works then it is super simple, otherwise not so much.

Functional programs typically exhibit a much tighter and better design because they asks the programmer to define the actions before these take place, and leads to a better isolation of the steps. This has far-reaching consequences, my programs with functional constructs end up more resilient to bugs and errors.

On the downside, it is possible to overdo the "functional" aspect and use overly high levels of abstractions that make programs harder to understand - or maybe I am just not skilled enough to comprehend them - as with everything there is a balance. Even though I love the concepts I find it very difficult to understand purely functional programs where no imperative constructs are used.

If functional programming is too much abstraction it is just as fine to use "imperative" constructs, but everyone should add a little bit of functional spice to their programs.

0
Entering edit mode

I did think along these lines, but my point was: Most of the functional part is accomplished using anonymous functions/lambdas/whatever which would make it more difficult to manage over time. Sure, defining a reusable function is good but it is seldom done in R. You'll see this a lot more often: vec %>% sapply(function(item) { x = gsub(x=item, ...); do_something(x); }) %>% ... than a named function that is defined before the sapply call.

0
Entering edit mode

Thank you for the explanation why functional programming is most often to be preferred over imperative programming!