Proud to come here to BioStars to announce that we're offering 30x whole genome sequencing for $3,490! This seems like the kind of place where like minded people have also really, really, really wanted to get their own genome sequenced. Well, at least me and my founder have really wanted to do it. Recode just ran a story on us!
Personal genome sequencing can be framed in terms of "helping science" or "learning about yourself" but it is undeniable that the vast majority of the information related to genomes target human health and other related information. So it may be immaterial how one labels the product if it can mostly be used in one way.
It is a bit like selling a product with implications for health that have not been actually verified. There are plenty of such products and it seems the only requirement is to label them as "This product has not be approved by FDA". I wonder if any or all genome sequences or analyses should be labeled as such and if that would in fact satisfy regulatory bodies. These are uncharted waters.
After all just as Uber disrupts a heavily regulated taxi industry would it be possible that a bioinformaticians could disrupt medical industry? Or that an answer on Biostars on interpreting data could be constructed as essential component of a diagnosis? Stranger things have happened.
On Genohub I already see 2 packages offering 35x coverage on HiSeqX10 for $1800 (1 commercial in Seoul, one academic in Australia). I think with the rollout of the X10 in 1/2 a year it should not be a problem finding similar packages reliably. (Maybe these 2 now might be just to fill low capacity). And with bulk it should be even easier to find labs with a package at this price.
Your approach of storing only the variants is pretty clever. But what's your contingency plan for when the reference you're using becomes obsoleted? There are plenty of options--I'm just curious to see your thinking.
Are you really planning to ONLY store variants? I don't think is a good idea. While people have had the dream of a graph based representation like you describe for years I have yet to see a convincing and practical solution for it (despite many smart people working on it). It is harder than it seems. But, even more importantly, by not keeping the raw reads you lose the ability to take advantage of new alignment and variant calling methods. The cost of storing a 30X whole genome worth of raw unaligned reads (~80GB compressed) is relatively negligible.
Of course we will keep the bem files in "cold storage" as static files. But, the system to live query them has to be far more efficient and that's what will be variant based.
The initial version will probably just be layers of variants on the reference, but I'm hoping to quickly replace that as I think it leads to unnecessary bias in the way we interpret the data (oh, this is *not normal*).
My plan is to implement a full graph approach that has no real reference, and each genome is just a path through a the graph of the whole of our stored genomes.
I like it! Thanks for the explanation.
I think you probably should have done some trial runs and cooked up some demos using your own sequence before you launched this Kickstarter.
Actually, that was part of the issue... we couldn't find a lab that would work with 2 samples! They needed us to do the legal legwork, etc. Plus, sample collection was an issue. We're hoping to get the "group discount" and pay into the bulk purchase ourselves.