What type of database does gnomAD use?
3
2
Entering edit mode
14 months ago

Hello!

I'm somewhat new to the field of bioinformatics, thus I needed to know, within the scope of my master's project, which database type gnomAD truly uses. I did a great amount of research, but I still don't really comprehend it. I am aware that the gnomad data is accessible via Google Cloud and can be explored using either Hail or BigQuery.

Note: To give some context, I have my own data, so I would like to try to create my own type of exome aggregation database, and the idea was to try to replicate gnomAD and take advantage of its open code source, eventually being able to load my data into it.

But I'm attempting to understand some details first.

gnomAD • 912 views
ADD COMMENT
3
Entering edit mode
14 months ago

If you primarily want a variant warehouse that supports genomic region and sample queries at biobank scale, you should look into TileDB-VCF. TileDB-VCF offers Hail integrations to perform GWAS and other analyses. You would still need to fire up your own Spark cluster to support Hail. Perhaps more important for your application is that TileDB is a real open-source database that can work in a variety of cloud and cluster installations.

ADD COMMENT
1
Entering edit mode
14 months ago

I suppose there are multiple? Queries via Hail are very likely run against a Matrix Table, but the web application likely uses a different architecture (some graph database perhaps?). I would contact the Team at Broad for a definitive answer and some suggestions.

ADD COMMENT
0
Entering edit mode
ADD COMMENT
0
Entering edit mode

I have my own data, so I would like to try to create my own type of exome aggregation database, and the idea was to try to replicate gnomAD and take advantage

IMHO, the gnomad browser could be complicated, if you just want to expose your VCF data.

ADD REPLY

Login before adding your answer.

Traffic: 1882 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6