Question: Badly formed genome location? GATK GenotypeGVCFs
0
gravatar for jtwalker
19 months ago by
jtwalker20
jtwalker20 wrote:

I'm trying to genotype >1000 samples using the GATK pipeline. I've already created gVCFs for my samples, but when attempting to use GATK's GenotypeGVCFs tool, I get the following error

BiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
WARN  21:15:19,844 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
INFO  21:15:19,845 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant files
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.6-0-g89b7209):
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions https://www.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: Badly formed genome location: Contig 024218.1 given as location, but this contig isn't present in the Fasta sequence dictionary
##### ERROR ------------------------------------------------------------------------------------------

I've looked at the sequence dictionary files (.fai and .dict, not sure which is used) and they both contain the contig causing the error. Does anyone know what is going on here? Thanks

joint genotyping gatk • 1.1k views
ADD COMMENTlink modified 18 months ago • written 19 months ago by jtwalker20

what is the output of

grep -H -F '024218.1' /path/to/ref.dict /path/to/ref.fa.fai
ADD REPLYlink written 19 months ago by Pierre Lindenbaum119k
GCF.dict:@SQ    SN:NC_024218.1  LN:77392008 M5:0ad5ac74565c0b48329eec6020994b16 UR:file:/Users/lukehoekstra/GenomeAnalysisTK-3.7/GCF.fna
GCF.fna.fai:NC_024218.1 77392008    125 80  81
ADD REPLYlink written 19 months ago by jtwalker20

and the output of

cut -f 1 your.g.vcf  | grep -F 024218.1 |  uniq

?

ADD REPLYlink written 19 months ago by Pierre Lindenbaum119k
cut -f 1 S1097.g.vcf  | grep -F 024218.1 |  uniq
##contig=<ID=NC_024218.1,length=77392008>
NC_024218.1

I have over a thousand g.vcfs. Could it be that one of them contains a malformed contig name causing the problem?

ADD REPLYlink modified 19 months ago • written 19 months ago by jtwalker20

I have over a thousand g.vcfs. Could it be that one of them contains a malformed contig name causing the problem? If they've been all constructed the same way: no.

ADD REPLYlink written 19 months ago by Pierre Lindenbaum119k

They were all created with the same method, namely GATK's HaplotypeCaller

ADD REPLYlink written 19 months ago by jtwalker20
0
gravatar for jtwalker
18 months ago by
jtwalker20
jtwalker20 wrote:

Just to update this question in case anyone is having a similar problem - for some reason when I used BWA mem the output for a few of my files was malformed, which led to problems down the line. I simply realigned the files and my joint call worked.

ADD COMMENTlink written 18 months ago by jtwalker20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2037 users visited in the last hour