GATK genomicsDBimport intervals for WGS
0
1
Entering edit mode
5.1 years ago

We have a bunch of WGS samples and would like to import them in genomicsDBimport before joint genotyping. We are for this project interested in coding sequences. Is it better :

  1. To use -L with gencode coding sequences annotation and put --merge-input-intervals to TRUE

  2. To split the analysis and execute one instance of genomicsDBimport per chromosome (e.g. -L chr1). My idea would be to use a job-array on my local slurm cluster (one job per chromosome). But what about the merging ? Should I put the same --genomicsdb-workspace-path for all jobs then ?

version of GATK : 4.1

Thank

gatk genomicsdbimport • 6.4k views
ADD COMMENT
1
Entering edit mode

I had asked a similar question to GATK help/discussion community. From the answers I gathered, looks like it is not recommended to have discontinuous intervals. Actually, they suggested that it would be best that the smallest interval is one whole chromosome. This would avoid problems at the edges of different intervals because GATK is doing local assemblies for each variant site. For merging, I would merge the results at final joint-called VCF level.

ADD REPLY
0
Entering edit mode

Hello, can you give more details about WGS interval? Do I need to run genomicsDBimport command seperately for each chromosome? If yes do I need to use different workspace(--genomicsdb-workspace-path)?

ADD REPLY
0
Entering edit mode

Running these steps for each chromosome is largely because there is no enough computational resources for running the entire genome in one shot. If you do run them separately, I think you need to run it in separate commands and use different workspace path.

ADD REPLY
0
Entering edit mode

Will this also be the case for exome data? Ideally I'd like to run all chr's at once too.

A second question, what would the syntax be for the X and Y chr's - Is it chrX, chrY or X, Y?

ADD REPLY
0
Entering edit mode

If you do have to do them all separately, can they all be gathered up and easily studied together when joint-called using GenotypeGVCFs?

ADD REPLY
0
Entering edit mode

It depends on the version of reference genome you used. It should match the name of the chromosome in the reference genome.

ADD REPLY
0
Entering edit mode

did you ever get a final answer to this?

ADD REPLY

Login before adding your answer.

Traffic: 2698 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6