Do you know of any open source bioinformatics machine learning projects?
2
5
Entering edit mode
8.5 years ago
dhbradshaw ▴ 130

For the past few days I've been trying to gather a list of interesting open source projects where tools from machine learning are applied to biological problems. Here's the (way too small) list so far.

https://github.com/dhbradshaw/ml-for-genomics/blob/master/README.md

It's been surprisingly hard to find a lot of material. So since Google is failing me, I thought I'd come here. What are the most interesting such projects that you know of?

Or let's remove the filter! Even if they're boring: What open source projects do you know of where machine learning techniques are applied to biological questions?

machine-learning • 4.3k views
ADD COMMENT
1
Entering edit mode

Some machine learning aspects could be integrated in more generic frameworks e.g. Biopython, BioJava, BioPerl, Cytoscape... Are you also interested in those ?

ADD REPLY
0
Entering edit mode

Definitely.

I guess the strongest thing to do in a case like that would be to find the machine learning portions of the projects and figure out how to link to them rather than just link to the projects as a whole.

ADD REPLY
1
Entering edit mode

Since deep learning seems to be the hot topic of the moment I'd point out this paper Deep learning for regulatory genomics (Nat Biotech 2015) and refs therein.

If you struggle to find material about ML applied biology it might be simply because ML is just a set of tools after all. So a project might be using ML without explicitly "advertising" it.

Thanks for putting together this list anyway!

By the way, all the many Bioconductor packages relying on some variation of linear modelling (limma, edgeR, DESeq, ...), shouldn't they be included as well?

ADD REPLY
0
Entering edit mode

Thanks for posting the paper!

I'm yet not well informed enough to make a judgement on edgeR, DESeq, etc. (Hence the project :-) ) Do you think they would strengthen or dilute the list?

ADD REPLY
3
Entering edit mode
8.5 years ago
rcasey ▴ 30

Here are two that I've played around with for my hobby:

  1. https://github.com/BauerLab/VariantSpark
  2. https://github.com/bigdatagenomics/adam

ADAM is probably the most elaborate and well funded (in terms of resources and commits) out of frameworks I've seen. Hope this helps.

ADD COMMENT
1
Entering edit mode

Fantastic. This is exactly the kind of thing I'm looking for. Thanks!

ADD REPLY
1
Entering edit mode
8.5 years ago

Since you list CellProfiler, I'll add CellCognition, another bioimage informatics software with emphasis on time-lapse images.

ADD COMMENT
0
Entering edit mode

I'll add it. Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1845 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6