Question: GATK combine gvcf superslow
0
gravatar for Picasa
2.6 years ago by
Picasa560
Picasa560 wrote:

Hi,

I am working on a classic variant calling on a non model organism.

I have 250 samples and followed the GATK best practice.

I have produced 250 g.vcf with HaplotypeCaller, now the next step is to combine those g.vcf and produce a .vcf (with GenotypeGVCFs) either:

A) Solution A: using GenomicsDBImport

B) Solution B: using CombineGVCFs

But those methods are super slow.

I am wondering if it is possible to produce one vcf per g.vcf with GenotypeGVCFs (quite fast) and then combine the 250 vcf with an another program ?

Does it produce the same result ? Thanks.

ADD COMMENTlink written 2.6 years ago by Picasa560

CatVariants should help with the combine part of your workflow, I think.

ADD REPLYlink written 2.6 years ago by RamRS30k

split by chromosome.

ADD REPLYlink written 2.6 years ago by Pierre Lindenbaum131k

I am working on a non model organism, with a genome that have been assembled. Unfortunately, this is quite fragmented. Is is still worth do it ?

ADD REPLYlink written 2.6 years ago by Picasa560

if there are 1000 contigs and you can run 1000 CombineGVCFs jobs in parallel, then it will be 1000 times faster..

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Pierre Lindenbaum131k

Hi Pierre. Do you split by chromosome for combineGVCFs or GenomicsDBimport ?

ADD REPLYlink written 19 months ago by Nicolas Rosewick9.1k

yes .

ADD REPLYlink written 19 months ago by Pierre Lindenbaum131k

should also work for genomicsDBimport though :)

ADD REPLYlink written 19 months ago by Nicolas Rosewick9.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1233 users visited in the last hour