What is currently the best user friendly (visual and interactive) VCF/BCF mining tool in 2021? For VCF/BCF similar to size or even larger than the 1000 human genomes VCF?
I guess most organization do not have a visual and interactive mining VCF mining tool but use either:
- A website front-end + batch query system back-end, submit your query and wait few minutes to hours to get back results. Maybe get no results back, too many results, or wrong results. And then repeat.
- A (junior) bio-informatician that runs a query/few queries on the command line every time a non linux/programming experienced biologist has a question.
I asked this question already around 5 years ago, and wonder what the situation currently is.
So 100M plus variants, 1000+ samples, compressed BCF file size 500G+, uncompressed VCF several TB+
One requirement is that it should do all kinds of filtering that bcftools view does:
But BCFTools does not meet the interactive and visual requirements. BCFTools is only interactive for small VCF files or when you use the tabix index to limit the query to a small region.
Another requirements if that the filtering is visual and interactive, like for example with a small genotype matrix in Excel. (I know bad idea but at least Excel interactive, visual and biologist friendly).
With interactive I mean that a filter criteria can be adjusted and you semi reall-time (few seconds to 1 minute) get back your updated result genotype matrix. Even for complex queries were the full 100M+ variants for all 1000+ samples should be scanned the tool should be interactive.
Does something like this already exist? If so which tools?
Mostly curious about what open source solution there are, but also curious if there are any commercial solutions?
See also this older question and answers:
I am/was hoping that nowadays something like the following exists:
- scalable database (cluster) (e.g. mongodb/spark etc) that stores a large VCF/BCF content; variants and genotypes
- bcftools view like domain code could do queries
- results reported (full/paginated or summarized) in a website/fat GUI.