Badly formed genome location? GATK GenotypeGVCFs
1
0
Entering edit mode
4.2 years ago
jtwalker ▴ 20

I'm trying to genotype >1000 samples using the GATK pipeline. I've already created gVCFs for my samples, but when attempting to use GATK's GenotypeGVCFs tool, I get the following error

BiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
WARN  21:15:19,844 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
INFO  21:15:19,845 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant files
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.6-0-g89b7209):
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: Badly formed genome location: Contig 024218.1 given as location, but this contig isn't present in the Fasta sequence dictionary
##### ERROR ------------------------------------------------------------------------------------------


I've looked at the sequence dictionary files (.fai and .dict, not sure which is used) and they both contain the contig causing the error. Does anyone know what is going on here? Thanks

GATK Joint genotyping • 3.0k views
0
Entering edit mode

what is the output of

grep -H -F '024218.1' /path/to/ref.dict /path/to/ref.fa.fai

0
Entering edit mode
GCF.dict:@SQ    SN:NC_024218.1  LN:77392008 M5:0ad5ac74565c0b48329eec6020994b16 UR:file:/Users/lukehoekstra/GenomeAnalysisTK-3.7/GCF.fna
GCF.fna.fai:NC_024218.1 77392008    125 80  81

0
Entering edit mode

and the output of

cut -f 1 your.g.vcf  | grep -F 024218.1 |  uniq


?

0
Entering edit mode
cut -f 1 S1097.g.vcf  | grep -F 024218.1 |  uniq
##contig=<ID=NC_024218.1,length=77392008>
NC_024218.1


I have over a thousand g.vcfs. Could it be that one of them contains a malformed contig name causing the problem?

0
Entering edit mode

I have over a thousand g.vcfs. Could it be that one of them contains a malformed contig name causing the problem? If they've been all constructed the same way: no.

0
Entering edit mode

They were all created with the same method, namely GATK's HaplotypeCaller

0
Entering edit mode
4.1 years ago
jtwalker ▴ 20

Just to update this question in case anyone is having a similar problem - for some reason when I used BWA mem the output for a few of my files was malformed, which led to problems down the line. I simply realigned the files and my joint call worked.