Question: GATK combine gvcf superslow
0
gravatar for Picasa
2.3 years ago by
Picasa530
Picasa530 wrote:

Hi,

I am working on a classic variant calling on a non model organism.

I have 250 samples and followed the GATK best practice.

I have produced 250 g.vcf with HaplotypeCaller, now the next step is to combine those g.vcf and produce a .vcf (with GenotypeGVCFs) either:

A) Solution A: using GenomicsDBImport

B) Solution B: using CombineGVCFs

But those methods are super slow.

I am wondering if it is possible to produce one vcf per g.vcf with GenotypeGVCFs (quite fast) and then combine the 250 vcf with an another program ?

Does it produce the same result ? Thanks.

ADD COMMENTlink written 2.3 years ago by Picasa530

CatVariants should help with the combine part of your workflow, I think.

ADD REPLYlink written 2.3 years ago by RamRS27k

split by chromosome.

ADD REPLYlink written 2.3 years ago by Pierre Lindenbaum128k

I am working on a non model organism, with a genome that have been assembled. Unfortunately, this is quite fragmented. Is is still worth do it ?

ADD REPLYlink written 2.3 years ago by Picasa530

if there are 1000 contigs and you can run 1000 CombineGVCFs jobs in parallel, then it will be 1000 times faster..

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Pierre Lindenbaum128k

Hi Pierre. Do you split by chromosome for combineGVCFs or GenomicsDBimport ?

ADD REPLYlink written 14 months ago by Nicolas Rosewick8.8k

yes .

ADD REPLYlink written 14 months ago by Pierre Lindenbaum128k

should also work for genomicsDBimport though :)

ADD REPLYlink written 14 months ago by Nicolas Rosewick8.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 874 users visited in the last hour