Question

Is it possible to access the code of a bioinformatics article?

0

Entering edit mode

3 months ago

sil_bioinfo ▴ 40

Hello,

I am sorry if this question is very silly but... when I read a bioinformatics paper (about machine learning algorithms, RNA-Seq analysis,... for example) they usually put the R/Python packages they use and briefly describe the workflow. However, if you want to see the code you can't find it anywhere (or references). Is this normal? Because for example, in biology, it is necessary to describe in detail the protocols you did, the techniques,... in case someone want to reproduce it or modify it and test it with their own samples... Is it normal?

Sorry if it's a very silly question, I was just curious. If you have access to the code, where do you look for it? I thought in the github of the first author of the paper maybe

Thank you in advance!

publication • 905 views

ADD COMMENT • link updated 3 months ago by Zhenyu Zhang ★ 1.2k • written 3 months ago by sil_bioinfo ▴ 40

0

Entering edit mode

Can you provide examples of papers where you did not find this information? There will generally be a section about "data availability" that may also include information about code in many papers. This information may also be provided in supplementary materials section that can be found online.

I will probably get flak for saying this but plain bulk RNAseq analysis is mature enough at this point that even without exact code you should be able to reasonably reproduce the analysis as long as you stay with the same genome build and use appropriate aligners/programs.

ADD REPLY • link 3 months ago by GenoMax 141k

0

Entering edit mode

Normally under "data availability" there are only links to datasets (raw data and/or results data, not code). And in the "Supplementary material" section I almost always see tables, but no code. And I find this very often, that's why I was asking. For example, I am now reading this paper: https://www.frontiersin.org/articles/10.3389/fcimb.2024.1285493/full

ADD REPLY • link 3 months ago by sil_bioinfo ▴ 40

score 2 · Answer 1 · 2024-01-24

2

Entering edit mode

3 months ago

WouterDeCoster 47k

I expect they will either provide a link to the code, probably on GitHub, or they won't share it at all. That is indeed a massive problem for reproducibility. You can always try asking the authors by emailing them, but your success will vary.

ADD COMMENT • link 3 months ago by WouterDeCoster 47k

0

Entering edit mode

And in case they do not share it, when they want to publish the article, how do they prove to the reviewers and to the journal where they want to publish that this method works and that it gives these results? Just curious, thank you!

ADD REPLY • link 3 months ago by sil_bioinfo ▴ 40

2

Entering edit mode

Scientific publishing is based on trust to a large extent. We generally don't question if an experiment described in the paper was actually done nor do we try to reproduce the actual experiment since it would not be practical.

That said some journals may have specific requirements (may need to share non-public SRA data with reviewers) for data/code and results as a requirement for acceptance.

PLOS: https://journals.plos.org/ploscompbiol/s/code-availability
Bioinformatics: https://academic.oup.com/bioinformatics/pages/instructions_for_authors

Frontiers (paper you linked) seems to have no specific policy for code availability.

ADD REPLY • link 3 months ago by GenoMax 141k

1

Entering edit mode

geno i am confused as to whether you are saying:

it is based on trust right now, de facto
it must be based on trust
it should be based on trust
we generally dont question ... (and shouldnt!)
we generally dont question ... (but this is terrible, and we should)

depending on which of those or numerous other options you mean, i potentially either agree or disagree. that's about all i can say.

i will take a firm position, though, nevertheless: there are plenty of scientists who do question methods, experiments, and results. i cant count the number of times I've seen someone pick up a paper, glance at it, look at figure 1, make a frown, then dismiss it and put it back on the desk. what just happened there? well - it happened quickly, but that scientist just determined they don't think that paper is reliable enough to keep reading in about 60 seconds. please dont tell me that doesnt happen - it certainly happens and its based (in some bases) on assessment of methods used.

if you do not open source your code, youre preventing scientists from being able to accurately evaluate your work. even if that is common, its a terrible practice and should stop.

another thing. The scientists I know who do regularly question methods, experiments, and results, are typically very good. i might even say most of the best scientists i know absolutely question methods and inferences drawn from them, usually pretty incisively and shrewdly.

for those reasons, i think simply saying "it is based on trust to a large extent" misses the mark. that statement is true some of the time, but dismisses the views of people who would rather it not be this way, but ignores the actions of scientists who DO open source, and so on.

those reasons sil_bioinfo , persuade me that scientists who neither open source code nor respond to requests for it are the lowest common denominator in the field of bioinformatics. they may be very skilled, of course, but they are the worst members of the community.

finally, starting from a very different point (that of the taxpayer), i also dont think hiding/hoarding the code and data paid for with public money is fair to the public.

ADD REPLY • link 3 months ago by LauferVA 4.2k

0

Entering edit mode

I am a biologist and I recently started in bioinformatics (master degree), and from this side I was wondering about this when I read some bioinformatics papers and there were no links to see the code.

In biology, you have to put everything in detail, both the materials and the methods, not so much to be able to reproduce your work but so that anyone who is doing similar research can try out your method in case it is useful for their research. As it is already published, if you use it, you obviously have to reference it. And then if you create something super cool that can be used fo everybody, there are patents and so on.

So, I was wondering if it was the same in bioinformatics. I understand that any code that you use that is not your own creation and you take it from somewhere, you have to reference it. If you use any software that has already been created you obviously have to reference it as well. So I didn't understand why in some of the papers I read there were no references to the code anywhere...

And as I said, my question is not so much to reproduce the work of a paper with the same data... However, I would find this useful for reviewers, at least in my opinion. Because by not checking these things, it could be the case (I hope not) that they manipulated the results to show what they were interested in... and by not having code to test it with, how would you know if that method is reliable or not? I don't know about the industry, but when it comes to academia, there is an obsession with publishing articles on anything, no matter how small, incredible...

I don't know if I'm explaining myself well, sorry in advance!

ADD REPLY • link 3 months ago by sil_bioinfo ▴ 40

0

Entering edit mode

Oh okay, I did not know that. I was just curious about this. Thank you!

ADD REPLY • link 3 months ago by sil_bioinfo ▴ 40

score 2 · Answer 2 · 2024-01-24

2

Entering edit mode

3 months ago

Jeremy Leipzig 22k

The "open data, open code" issue has been studied quite a bit. If you're interested in learning more about this, the "case studies" section at Awesome Reproducible Research lists several censuses that attempt to measure the frequency of open code such as Boudreau et al. and Hrynaszkiewicz et al

Hrynaszkiewicz

ADD COMMENT • link 3 months ago by Jeremy Leipzig 22k

score 1 · Answer 3 · 2024-01-24

1

Entering edit mode

3 months ago

Zhenyu Zhang ★ 1.2k

We provide all code in a git repository in our publications.

Also all the journals I helped to review require code to be in public. So I think the field is moving to the right direction.

The caveat is that, it's often still challenging to reproduce the work even you have all the scripts. But at least it's a good step forward.

ADD COMMENT • link 3 months ago by Zhenyu Zhang ★ 1.2k

0

Entering edit mode

Oh, good to know it. Rather than trying to reproduce it, I was asking because sometimes when you are doing research, and you read papers to find out what has been done so far, you might find one that resembles your work and more than anything else, the code would serve as a guide (like when you look up protocols and see that they have already been tried and tested and worked) for your research, not to copy anyone else's work.

ADD REPLY • link 3 months ago by sil_bioinfo ▴ 40

1

Entering edit mode

It's a good discussion. On my wish list is a database that records "these are the methods we have tried and not work. Please don't waste you time on this path".

ADD REPLY • link 3 months ago by Zhenyu Zhang ★ 1.2k