Question: GATK genomicsDBimport intervals for WGS
gravatar for Nicolas Rosewick
12 months ago by
Belgium, Brussels
Nicolas Rosewick8.7k wrote:

We have a bunch of WGS samples and would like to import them in genomicsDBimport before joint genotyping. We are for this project interested in coding sequences. Is it better :

  1. To use -L with gencode coding sequences annotation and put --merge-input-intervals to TRUE

  2. To split the analysis and execute one instance of genomicsDBimport per chromosome (e.g. -L chr1). My idea would be to use a job-array on my local slurm cluster (one job per chromosome). But what about the merging ? Should I put the same --genomicsdb-workspace-path for all jobs then ?

version of GATK : 4.1


genomicsdbimport gatk • 1.1k views
ADD COMMENTlink modified 11 months ago by Biostar ♦♦ 20 • written 12 months ago by Nicolas Rosewick8.7k

I had asked a similar question to GATK help/discussion community. From the answers I gathered, looks like it is not recommended to have discontinuous intervals. Actually, they suggested that it would be best that the smallest interval is one whole chromosome. This would avoid problems at the edges of different intervals because GATK is doing local assemblies for each variant site. For merging, I would merge the results at final joint-called VCF level.

ADD REPLYlink written 12 months ago by Vitis2.3k

Hello, can you give more details about WGS interval? Do I need to run genomicsDBimport command seperately for each chromosome? If yes do I need to use different workspace(--genomicsdb-workspace-path)?

ADD REPLYlink written 10 months ago by MatthewP400

Running these steps for each chromosome is largely because there is no enough computational resources for running the entire genome in one shot. If you do run them separately, I think you need to run it in separate commands and use different workspace path.

ADD REPLYlink written 10 months ago by Vitis2.3k

Will this also be the case for exome data? Ideally I'd like to run all chr's at once too.

A second question, what would the syntax be for the X and Y chr's - Is it chrX, chrY or X, Y?

ADD REPLYlink written 4 weeks ago by Maverick770

If you do have to do them all separately, can they all be gathered up and easily studied together when joint-called using GenotypeGVCFs?

ADD REPLYlink written 4 weeks ago by Maverick770
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1207 users visited in the last hour