Badly formed genome location? GATK GenotypeGVCFs
1
0
Entering edit mode
6.6 years ago
jtwalker ▴ 20

I'm trying to genotype >1000 samples using the GATK pipeline. I've already created gVCFs for my samples, but when attempting to use GATK's GenotypeGVCFs tool, I get the following error

BiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
WARN  21:15:19,844 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
INFO  21:15:19,845 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant files
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.6-0-g89b7209):
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions https://www.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: Badly formed genome location: Contig 024218.1 given as location, but this contig isn't present in the Fasta sequence dictionary
##### ERROR ------------------------------------------------------------------------------------------

I've looked at the sequence dictionary files (.fai and .dict, not sure which is used) and they both contain the contig causing the error. Does anyone know what is going on here? Thanks

GATK Joint genotyping • 4.4k views
ADD COMMENT
0
Entering edit mode

what is the output of

grep -H -F '024218.1' /path/to/ref.dict /path/to/ref.fa.fai
ADD REPLY
0
Entering edit mode
GCF.dict:@SQ    SN:NC_024218.1  LN:77392008 M5:0ad5ac74565c0b48329eec6020994b16 UR:file:/Users/lukehoekstra/GenomeAnalysisTK-3.7/GCF.fna
GCF.fna.fai:NC_024218.1 77392008    125 80  81
ADD REPLY
0
Entering edit mode

and the output of

cut -f 1 your.g.vcf  | grep -F 024218.1 |  uniq

?

ADD REPLY
0
Entering edit mode
cut -f 1 S1097.g.vcf  | grep -F 024218.1 |  uniq
##contig=<ID=NC_024218.1,length=77392008>
NC_024218.1

I have over a thousand g.vcfs. Could it be that one of them contains a malformed contig name causing the problem?

ADD REPLY
0
Entering edit mode

I have over a thousand g.vcfs. Could it be that one of them contains a malformed contig name causing the problem? If they've been all constructed the same way: no.

ADD REPLY
0
Entering edit mode

They were all created with the same method, namely GATK's HaplotypeCaller

ADD REPLY
0
Entering edit mode
6.5 years ago
jtwalker ▴ 20

Just to update this question in case anyone is having a similar problem - for some reason when I used BWA mem the output for a few of my files was malformed, which led to problems down the line. I simply realigned the files and my joint call worked.

ADD COMMENT

Login before adding your answer.

Traffic: 1946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6