The rate limiting factor of genomics and really any of the types of analysis common to bioinformatics (e.g. transcriptomics, pathway analysis) has never really been the lack of computational resources.
Yes I know that improvements in computing power have allowed more and more data to be handled and in doing so has allowed new methods to be developed, old ones to be improved and the scope and scale of projects to be grown. However, Big Data is just a great marketing term to repackage old methods and technology, with some sprinkling of new tech onto it.
Faacebook, Google, and so on actually deal with truly large volumes of data. They deal with enormous live data streams and dump them into enormous stockpiles of data. They need to do analysis on this type of data which as OP says, typical HPC systems designed mostly for numerical simulations don't work.
Biological sciences, medical sciences and in turn their computational and informatic subsets don't have this volume of data nor do they have any of the dynamic nature seen in true "Big Data" technologies. Maybe one day I'll be able to get live streams of transcript expression levels from cells, but not for a while. Something people seem to forget is that although sequencing is getting cheaper, it is enormously expensive when you start to think experimentally. Sure $300 usd per sample is great for mammalian RNA-Seq but if I want to compare 4 experimental conditions with 3 replicates each all over a five point time series, it becomes 18k, which isn't an enormous amount of money by any means, but the point is that it gets expensive rapidly.
Still, not much has changed in the past 5 or even 10 years with what I can do with that data. Great, RNA-Seq or deep sequencing gives me bigger and more precise datasets, so now what? I can do GWAS and I get a list of variants with p-values. I do whatever DE analysis suits me and I get a list of genes with fold changes and p-values. Awesome, now what? Okay let's annotate it with GO/KEGG/etc and now I have a list of functions with p-values. No matter how many different ways I try and cut the cake I still get a list of crap with p-values and maybe some other metric hanging from them.
The rate limiting factor has never been the computational power, and is more infrequently a result of not having enough data, the problem has and still is that no matter how much data is generated or how much cleaner/precise/etc the data is, I still can't do a whole lot of anything with it because the ability to turn these piles of data into information is feeble at best. I get a list of genes that has the usual suspects, which are boring because they've been studied inside and out. Or I get a list of genes with novel items in it and I'm stuck spending a few nights on pubmed scratching my head trying to figure out what is going on.
GO and the like are more or less the same story, all I get is a coarse picture of my data with added noise. Great, "cellular component" that is really useful, or here's a detailed pathway for a process that has nothing to do with the biology I'm looking at (unless it does but that's after a nigh on pubmed). What is worse is that all of these things are massively biased to the people who annotated them to begin with. As soon as you want to look at anything but cancer in mice or humans, you're going to end up scratching your head trying to figure out how some human neuroblastoma crap has anything to do with your virus infected primary cells.
This problem is only going to get worse. I can do MS or a Kinome array along with my RNA-Seq, awesome I can tell which transcripts are there and what the phosphorylation states are. That sounds like a great way to integrate that data to get a really detailed picture of the activation status of pathways on a global scale. Except here we are again, there's zero data on what protein x being phosphorylated means unless its one of the few really well characterized proteins (which are almost all network hubs).
I could keep going, but I think too many people in the computational world who love the latest tech buzzwords and current industry people like to do is throw rocks from their ivory towers. No matter how you process the data, if its a Raspberry Pi or some facebook inspired trendy hadoop/hive/etc system doing the analysis, all that gets spit out is a big ol' list of shit with p-values. Whether it be the latest Illumina tech or the hottest MS approach, all you get is a list of p-values that you dump into your pathway enrichment tool of choice, crap out a few heatmaps and clustering diagrams and call it a day. We can tell people that pathway X was enriched, and genes A, B, C were upregulated or had SNPs or whatever, but our ability to generate biological insight ends there. Its up/down, on/off, has SNPs or doesn't, but chances are that is about as much insight that can be provided. If you have further insight it chances are you're dealing with the usual handful of genes or you're lucky enough to have a ton of ground work already done.
The raw data, the contextual data, the completion flat interaction networks and the amount of contextual/dynamic annotations for networks and pathways are still poor, incomplete and highly biased to whoever put them there. This is where people forget that a good bioinformatician is a good biologist, or at least should be. I do about 50/50 wet and dry work, so maybe I have a warped perspective, but ivory towers are still a huge problem in this field.
Until Facebook talent can stare into the cytoplasm of my cells, tissues and compartments and hear them whisper their secrets, I'll continue to have searing eye pain every time I hear someone say that our problem is that we're failing to keep up with the cool kids and whatever trend they're on.
Also OP, don't forget that there's plenty of things that benefit from traditional HPCs. If anything I think that this will only increase with time as physical and dynamical simulations become more important tools for analysis. Computational biology isn't just GWAS and genomics.