Question

Forum:You have 3 days of zero obligations and expansive resources. What are you building?

0

Entering edit mode

6 months ago

jared.andrews07 ★ 17k

A bit of a thought experiment.

You are ~~trapped on a desert island~~ relaxing at an all-inclusive resort when you're struck with the sudden inspiration to solve a problem. The resort also doubles as an AWS data/computing center, so guests have free access to nearly limitless computational power (CPUs, GPUs, you name it). Coincidentally, there is also the world's largest science/bioinformatics/computing conference happening at the resort at the same time, and all the attendees are bored stiff of sitting through 5 straight days of seminars while overindulging during the evenings.

So a veritable legion of whatever experts you need are on hand to help you in whatever capacity is needed. In fact, you've made good friends with them over the past week lounging around the pool discussing and planning solutions to your most pressing problems.

You've got 3 days left at the resort - what are you building or solving with your new friends? Software to deal with pain points in your day to day work? A critical gap in your field of interest? A risky new method to enable some type of analysis?

Bonus points for the more detail you can provide about the problem, potential approaches, etc.

hypothetical thought-experiment • 990 views

ADD COMMENT • link 5 months ago by jared.andrews07 ★ 17k

3

Entering edit mode

I had this happen some time ago, except for unlimited computational power. Went to swim with sea turtles, and I'd still do the same even with limitless resources.

ADD REPLY • link 6 months ago by Mensur Dlakic ★ 28k

0

Entering edit mode

The sea turtles near this resort are incidentally infamous for biting off digits of unsuspecting tourists. True menaces.

ADD REPLY • link 6 months ago by jared.andrews07 ★ 17k

1

Entering edit mode

Probably a bit late here adding this idea, but if such extensive resources were to be available, it would probably make sense to put them to use to investigate just how "correct" the annotations of all the sequences available on NCBI and other repositories actually are.

ADD REPLY • link 5 months ago by Dunois ★ 2.6k

score 1 · Answer 1 · 2024-03-04

I would build tools that are helpful for wet lab scientist to deal with their pain points in their day to day work. Most times they don't know how we can help them, and we can be most helpful with some minor work. Some tools that would be helpful:

Annotation of genes in pathways, ontologies, diseases and other resources: Most resources like GO, reactome, WikiPathways are heavily biased, scientist don't know how to contribute to those resources, and we only use them without contributing back. Having an easy way to contribute would help all, but this would require time to review which would exceed the 72h.
Automate or help with tedious tasks they have: I had a researcher as for the IDs from a sequencing facility they provided because she needed it for another form but she couldn't copy from their websites (and there were 59 samples with long names). Ad-hoc websites for researchers to compare internal results (not yet on GEO or other public resources). Recently they spent quite some time designing a flow cytometry panel when perhaps a tool to do that would help them. These small tasks are the ones that hinder most (academic) researchers.
How to design experiments preventing batch effects. Most tools are designed to account and correct for after the experiment is done, we could prevent them before if they had tools. A tool that helped prevent it would be helpful (if used ;)

score 1 · Answer 2 · 2024-03-04

I would build a platformed aimed at extracting models from distributed datasets without requiring data or metadata to be transferred over the platform; i.e., it operates local data and local compute, and only communicates API commands, parameter updates, or bulk statistics (which can be suitably perturbed to protect privacy). A user could:

Register data + metadata with the platform, sharing it to be used as training or validation
Identify registered datasets of interest (e.g., fMRI data with a neurological disease phenotype)
Request a uniform pre-processing of identified datasets
Fit a statistical model leveraging all datasets
Evaluate the model on the local dataset (or any datasets registered as "Public"); and interrogate the resulting model

Basically: Data silos definitely exist; but you don't have to de-silo all the data to benefit from "all" of it -- just enable the results of pre-specified privacy-preserving (in a zero-knowledge sense) computations to be shared. Throw in a tit-for-tat mechanism (the more you share, the more computations you can run) and you might have a way to encourage a weird sort of cooperation.

Edit: But first I would invent a way to fit a year's worth of time into a single day.