Question: GATK: Input files reads and reference have incompatible contigs: No overlapping contigs found
1
gravatar for kumbarov
4.8 years ago by
kumbarov10
Sweden
kumbarov10 wrote:

Hi,

I am trying to generate VCF file from a BAM file from FamilyTreeDNA. I run the following command:

java -Xmx2g -jar GenomeAnalysisTK.jar -l INFO -R resources/Homo_sapiens_assembly19.fasta -T UnifiedGenotyper -I /tmp/0209.sorted.bam -L Y -rf BadCigar -o 0209.vcf --output_mode EMIT_ALL_CONFIDEN

This is the error I get:

ERROR MESSAGE: Input files reads and reference have incompatible contigs: No overlapping contigs found.

I searched the forums and figured out it is due to mismatching reference and BAM file headers. This is my BAM file header:

SN:chrM
SN:chr1
SN:chr2
SN:chr3
SN:chr4
SN:chr5
SN:chr6
SN:chr7
SN:chr8
SN:chr9
SN:chr10
SN:chr11
SN:chr12
SN:chr13
SN:chr14
SN:chr15
SN:chr16
SN:chr17
SN:chr18
SN:chr19
SN:chr20
SN:chr21
SN:chr22
SN:chrX
SN:chrY
SN:chr1_gl000191_random
SN:chr1_gl000192_random
SN:chr4_ctg9_hap1
SN:chr4_gl000193_random
SN:chr4_gl000194_random
SN:chr6_apd_hap1
SN:chr6_cox_hap2
SN:chr6_dbb_hap3
SN:chr6_mann_hap4
SN:chr6_mcf_hap5
SN:chr6_qbl_hap6
SN:chr6_ssto_hap7
SN:chr7_gl000195_random
SN:chr8_gl000196_random
SN:chr8_gl000197_random
SN:chr9_gl000198_random
SN:chr9_gl000199_random
SN:chr9_gl000200_random
SN:chr9_gl000201_random
SN:chr11_gl000202_random
SN:chr17_ctg5_hap1
SN:chr17_gl000203_random
SN:chr17_gl000204_random
SN:chr17_gl000205_random
SN:chr17_gl000206_random
SN:chr18_gl000207_random
SN:chr19_gl000208_random
SN:chr19_gl000209_random
SN:chr21_gl000210_random
SN:chrUn_gl000211
SN:chrUn_gl000212
SN:chrUn_gl000213
SN:chrUn_gl000214
SN:chrUn_gl000215
SN:chrUn_gl000216
SN:chrUn_gl000217
SN:chrUn_gl000218
SN:chrUn_gl000219
SN:chrUn_gl000220
SN:chrUn_gl000221
SN:chrUn_gl000222
SN:chrUn_gl000223
SN:chrUn_gl000224
SN:chrUn_gl000225
SN:chrUn_gl000226
SN:chrUn_gl000227
SN:chrUn_gl000228
SN:chrUn_gl000229
SN:chrUn_gl000230
SN:chrUn_gl000231
SN:chrUn_gl000232
SN:chrUn_gl000233
SN:chrUn_gl000234
SN:chrUn_gl000235
SN:chrUn_gl000236
SN:chrUn_gl000237
SN:chrUn_gl000238
SN:chrUn_gl000239
SN:chrUn_gl000240
SN:chrUn_gl000241
SN:chrUn_gl000242
SN:chrUn_gl000243
SN:chrUn_gl000244
SN:chrUn_gl000245
SN:chrUn_gl000246
SN:chrUn_gl000247
SN:chrUn_gl000248
SN:chrUn_gl000249

And this is my dict file:

SN:1
SN:2
SN:3
SN:4
SN:5
SN:6
SN:7
SN:8
SN:9
SN:10
SN:11
SN:12
SN:13
SN:14
SN:15
SN:16
SN:17
SN:18
SN:19
SN:20
SN:21
SN:22
SN:X
SN:Y
SN:MT
SN:GL000207.1
SN:GL000226.1
SN:GL000229.1
SN:GL000231.1
SN:GL000210.1
SN:GL000239.1
SN:GL000235.1
SN:GL000201.1
SN:GL000247.1
SN:GL000245.1
SN:GL000197.1
SN:GL000203.1
SN:GL000246.1
SN:GL000249.1
SN:GL000196.1
SN:GL000248.1
SN:GL000244.1
SN:GL000238.1
SN:GL000202.1
SN:GL000234.1
SN:GL000232.1
SN:GL000206.1
SN:GL000240.1
SN:GL000236.1
SN:GL000241.1
SN:GL000243.1
SN:GL000242.1
SN:GL000230.1
SN:GL000237.1
SN:GL000233.1
SN:GL000204.1
SN:GL000198.1
SN:GL000208.1
SN:GL000191.1
SN:GL000227.1
SN:GL000228.1
SN:GL000214.1
SN:GL000221.1
SN:GL000209.1
SN:GL000218.1
SN:GL000220.1
SN:GL000213.1
SN:GL000211.1
SN:GL000199.1
SN:GL000217.1
SN:GL000216.1
SN:GL000215.1
SN:GL000205.1
SN:GL000219.1
SN:GL000224.1
SN:GL000223.1
SN:GL000195.1
SN:GL000212.1
SN:GL000222.1
SN:GL000200.1
SN:GL000193.1
SN:GL000194.1
SN:GL000225.1
SN:GL000192.1
SN:NC_007605

How can I get this to work or where can I find a suitable hg19 reference? My current reference is from ftp://ftp.ncbi.nlm.nih.gov/sra/reports/Assembly/GRCh37-HG19_Broad_variant/

bam gatk • 6.0k views
ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by kumbarov10
1
gravatar for kumbarov
4.8 years ago by
kumbarov10
Sweden
kumbarov10 wrote:

Editing the dict and fai files is cumbersome. I tried the following:

java -Xmx2g -jar GenomeAnalysisTK.jar -l INFO -R resources/chrY.fa \
     -T UnifiedGenotyper -I ${TMPDIR}/${NAME}.sorted.bam -L chrY \
     -rf BadCigar -o vcf_out/${NAME}.vcf --output_mode EMIT_ALL_CONFIDENT_SITES \
     -U ALLOW_SEQ_DICT_INCOMPATIBILITY

But I got an empty VCF file. How can I use a chrY.fa file only and get a VCF file with reference allele and genotype allele?

ADD COMMENTlink modified 5 months ago by RamRS26k • written 4.8 years ago by kumbarov10

Why edit the dict file, just make a new one from the right fasta file.

ADD REPLYlink written 4.8 years ago by Devon Ryan94k

Where is the correct file? Here I find only single chromosome files http://hgdownload.cse.ucsc.edu/goldenpath/hg19/chromosomes/

ADD REPLYlink written 4.8 years ago by kumbarov10

just `cat` the files....

ADD REPLYlink written 4.8 years ago by Pierre Lindenbaum127k

cat worked. The catch was to cat the files in the correct order.

ADD REPLYlink written 4.8 years ago by kumbarov10
1
gravatar for Pierre Lindenbaum
4.8 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum127k wrote:

How can I get this to work ?

fool the tools using a symbolic link (usually a bad idea) :  http://plindenbaum.blogspot.fr/2011/10/reference-genome-with-or-without-chr.html

where can I find a suitable hg19 reference?

http://hgdownload.cse.ucsc.edu/goldenpath/hg19/chromosomes/

 

ADD COMMENTlink written 4.8 years ago by Pierre Lindenbaum127k

I actually have conversion files for cases like this: https://github.com/dpryan79/ChromosomeMappings/blob/master/GRCh37_UCSC2ensembl.txt

ADD REPLYlink written 4.8 years ago by Devon Ryan94k
1
gravatar for Devon Ryan
4.8 years ago by
Devon Ryan94k
Freiburg, Germany
Devon Ryan94k wrote:

You appear to have mapped against the UCSC reference, so download the most recent GRCh37 patch set from there.

ADD COMMENTlink written 4.8 years ago by Devon Ryan94k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1781 users visited in the last hour