Joint Calling for Large Germline WGS Cohort
3
2
Entering edit mode
9 weeks ago
j.k3096 ▴ 20

Hello,

I am working with germline WGS data from a cohort of 2,700 patients. To study the germline variants in this cohort, I need to perform joint variant calling. I’ve started by creating a GenomicsDB (https://gatk.broadinstitute.org/hc/en-us/articles/360036883491-GenomicsDBImport) and plan to use GenotypeGVCFs afterward.

However, I am currently facing significant Memory usage challenges during the GenomicsDB creation step. As a workaround, I’ve been adding smaller batches (300–400 samples at a time) to the GenomicsDB.

If anyone here has worked with similarly large WGS cohorts or has experience in joint calling at this scale, I would greatly appreciate your recommendations and advice. I anticipate subsequent steps like GenotypeGVCFs may also be memory-intensive, so I am looking for ways to optimize resource usage.

One solution I’m considering is dividing the genome into smaller intervals but I would be grateful for any alternative approaches or optimizations you might suggest.

Thank you for your time and help !

Best regards,

NGS RAM cohort Genomics WGS • 652 views
ADD COMMENT
1
Entering edit mode
ADD COMMENT
1
Entering edit mode
ADD COMMENT
0
Entering edit mode
9 weeks ago

Not to get too pedantic but joint genotyping solves a different problem (removing artefactual variants) from producing a population VCF that can be analyzed.

If you just want the latter you can ingest individual VCFs (or gVCFs) into a TileDB-VCF dataset (the GenomicsDB you mention is an ancient predecessor of TileDB).

You can then perform basic chr/pos/sample queries in Python using the TileDB-VCF open source library:

https://github.com/TileDB-Inc/TileDB-VCF

... or switch to the commercial product if you need things like distributed queries, user-defined functions & task graphs, and access management:

https://www.tiledb.com/

(disclaimer: I am the product manager for TileDB-VCF)

ADD COMMENT

Login before adding your answer.

Traffic: 3312 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6