I would love to hear folks' candid thoughts on the biggest challenges / opportunities in the field of Bioinformatics? I plan on using this is a starting point for some research, so any books, articles, videos etc would be much appreciated :)
In no particular order:
- Standardisation of methods used in clinical practice (may very well be region and country-specific)
- Certification of who can call themselves a bioinformatician
- Data curation
- Increasing compute capacity (we are already reaching limits with large single-cell datasets)
- Training of new bioinformaticians
- Creating approved tools that adhere to standards mandated by regulatory agencies (e.g. FDA) such that they can be used by regular users (think Physicians). These tools need to produce results/reports that can make sense to respective users.
- Creating workflow/pipeline tools that can be used/understood by people who are not programmers
- Making cloud computing user accessible.
Edit: I should note that problem's described by @Ian are in the domain of computational biologists/statisticians. My list is from perspective of an applied bioinformatician.
As was alluded to above, its difficult to say what the biggest open problems in bioinformatics are because the position that bioinformatics occupies as as an enabler of other things. Thus many of the big open problems in bioinformtiacs are about infrastructure and don't require the skills we normally think of as bioinformatics skills (computer science, statistics, biological knowledge) and are actaully informatics problems and social problems (see @Kevin Blighe and @GenoMax's answers). These are actaully proper bio-informatics problems, but they are not the sort of problem that many people coming into bioinformatics want to solve (perhaps why they are still unsolved).
The other categories of problem are not bioinformatics problems, but biology problems that need bioinformaticians solutions.
I don't know about the most important, but some things I'd like to see tackled in 2021, from my perspective as someone interested in transcriptomics and gene-regulation:
- Proper statistical models, with theoretical, as well as empirical, justification for cross-technique comparison (e.g. comparing whole transcript scRNA-seq to UMI-tagged scRNAseq or either of those to bulk RNAseq, but in general any two datasets generated for negative binomial processes with unknown systematic and random biases).
- In a similar vain: routine extraction of biological parameters from single-cell data beyond just cell-type identity/differentiation state/linage. E.g. I'd love to see algorithms that used measurements of differential variability in single-cell data to imply conclusions about the structure and mechanisms of regulation happening.
- A perennial favorite, that I don't think is yet fully solved: identification of functionally relevant non-coding mutations (in both non-transcribed, and transcribed, but non-coding, sequence). Under-explored avenue that I see here, is the use of the large human variation datasets (e.g. gnomad) to explore within-species constraint in non-coding space.
- Joint estimation of expression, genotype and genotype:expression interactions (allelic imbalance) from RNAseq data, including the use of replicates (both within individual and between individual).