GATK combine gvcf superslow
0
1
Entering edit mode
6.1 years ago
Picasa ▴ 640

Hi,

I am working on a classic variant calling on a non model organism.

I have 250 samples and followed the GATK best practice.

I have produced 250 g.vcf with HaplotypeCaller, now the next step is to combine those g.vcf and produce a .vcf (with GenotypeGVCFs) either:

A) Solution A: using GenomicsDBImport

B) Solution B: using CombineGVCFs

But those methods are super slow.

I am wondering if it is possible to produce one vcf per g.vcf with GenotypeGVCFs (quite fast) and then combine the 250 vcf with an another program ?

Does it produce the same result ? Thanks.

gatk CombineGVCFs GenomicsDBImport • 5.0k views
ADD COMMENT
0
Entering edit mode

CatVariants should help with the combine part of your workflow, I think.

ADD REPLY
0
Entering edit mode

split by chromosome.

ADD REPLY
0
Entering edit mode

I am working on a non model organism, with a genome that have been assembled. Unfortunately, this is quite fragmented. Is is still worth do it ?

ADD REPLY
0
Entering edit mode

if there are 1000 contigs and you can run 1000 CombineGVCFs jobs in parallel, then it will be 1000 times faster..

ADD REPLY
0
Entering edit mode

Hi Pierre. Do you split by chromosome for combineGVCFs or GenomicsDBimport ?

ADD REPLY
0
Entering edit mode

yes .

ADD REPLY
0
Entering edit mode

should also work for genomicsDBimport though :)

ADD REPLY

Login before adding your answer.

Traffic: 2366 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6