Question: Correct reference for Platinum Genomes?
0
gravatar for Tails
11 months ago by
Tails60
New Zealand
Tails60 wrote:

I'm trying to familiarize myself with haplotype caller by calling variants using the data supplied by platinum genomes.

Here's the paper: A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree

The BAM files are located here

The paper clearly says that they used hg19 as the reference, so I downloaded the FASTA from this link: "FASTA from this link: 'ftp://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz'

But there are all sorts of compatibility issues between that FASTA and the BAM files, like this one:

ERROR MESSAGE: Input files reads and reference have incompatible contigs. Please see https://software.broadinstitute.org/gatk/documentation/article?id=63for more information. Error details: The contig order in reads and reference is not the same; to fix this please see: (https://www.broadinstitute.org/gatk/guide/article?id=1328), which describes reordering contigs in BAM and VCF files..
ERROR reads contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY]
ERROR reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chrX, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr20, chrY, chr19, chr22, chr21, chr6_ssto_hap7, chr6_mcf_hap5, chr6_cox_hap2, chr6_mann_hap4, chr6_apd_hap1, chr6_qbl_hap6, chr6_dbb_hap3, chr17_ctg5_hap1, chr4_ctg9_hap1, chr1_gl000192_random, chrUn_gl000225, chr4_gl000194_random, chr4_gl000193_random, chr9_gl000200_random, chrUn_gl000222, chrUn_gl000212, chr7_gl000195_random, chrUn_gl000223, chrUn_gl000224, chrUn_gl000219, chr17_gl000205_random, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chr9_gl000199_random, chrUn_gl000211, chrUn_gl000213, chrUn_gl000220, chrUn_gl000218, chr19_gl000209_random, chrUn_gl000221, chrUn_gl000214, chrUn_gl000228, chrUn_gl000227, chr1_gl000191_random, chr19_gl000208_random, chr9_gl000198_random, chr17_gl000204_random, chrUn_gl000233, chrUn_gl000237, chrUn_gl000230, chrUn_gl000242, chrUn_gl000243, chrUn_gl000241, chrUn_gl000236, chrUn_gl000240, chr17_gl000206_random, chrUn_gl000232, chrUn_gl000234, chr11_gl000202_random, chrUn_gl000238, chrUn_gl000244, chrUn_gl000248, chr8_gl000196_random, chrUn_gl000249, chrUn_gl000246, chr17_gl000203_random, chr8_gl000197_random, chrUn_gl000245, chrUn_gl000247, chr9_gl000201_random, chrUn_gl000235, chrUn_gl000239, chr21_gl000210_random, chrUn_gl000231, chrUn_gl000229, chrM, chrUn_gl000226, chr18_gl000207_random]

My conclusion is that I must've found the incorrect reference file. But have no idea what the correct one might be. Any help appreciated, this is only my second time running haplotype caller and I'm new to these file types and resources.

ADD COMMENTlink modified 11 months ago by JC12k • written 11 months ago by Tails60
1
gravatar for JC
11 months ago by
JC12k
Mexico
JC12k wrote:

The reference data from NCBI contains haplotype chromosomes and alternative regions. You can use the sequences from GATK bundle

ADD COMMENTlink written 11 months ago by JC12k

Thank you, do you mean from this link? ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/ucsc.hg19.fasta.gz Sorry for bothering you again, but I'm still getting this error message. Not sure if I should just start up another question thread?

ERROR MESSAGE: Input files reads and reference have incompatible contigs. Please see https://software.broadinstitute.org/gatk/documentation/article?id=63for more information. Error details: The contig order in reads and reference is not the same; to fix this please see: (https://www.broadinstitute.org/gatk/guide/article?id=1328), which describes reordering contigs in BAM and VCF files..
ERROR reads contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chrX, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr20, chrY, chr19, chr22, chr21, chr6_ssto_hap7, chr6_mcf_hap5, chr6_cox_hap2, chr6_mann_hap4, chr6_apd_hap1, chr6_qbl_hap6, chr6_dbb_hap3, chr17_ctg5_hap1, chr4_ctg9_hap1, chr1_gl000192_random, chrUn_gl000225, chr4_gl000194_random, chr4_gl000193_random, chr9_gl000200_random, chrUn_gl000222, chrUn_gl000212, chr7_gl000195_random, chrUn_gl000223, chrUn_gl000224, chrUn_gl000219, chr17_gl000205_random, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chr9_gl000199_random, chrUn_gl000211, chrUn_gl000213, chrUn_gl000220, chrUn_gl000218, chr19_gl000209_random, chrUn_gl000221, chrUn_gl000214, chrUn_gl000228, chrUn_gl000227, chr1_gl000191_random, chr19_gl000208_random, chr9_gl000198_random, chr17_gl000204_random, chrUn_gl000233, chrUn_gl000237, chrUn_gl000230, chrUn_gl000242, chrUn_gl000243, chrUn_gl000241, chrUn_gl000236, chrUn_gl000240, chr17_gl000206_random, chrUn_gl000232, chrUn_gl000234, chr11_gl000202_random, chrUn_gl000238, chrUn_gl000244, chrUn_gl000248, chr8_gl000196_random, chrUn_gl000249, chrUn_gl000246, chr17_gl000203_random, chr8_gl000197_random, chrUn_gl000245, chrUn_gl000247, chr9_gl000201_random, chrUn_gl000235, chrUn_gl000239, chr21_gl000210_random, chrUn_gl000231, chrUn_gl000229, chrM, chrUn_gl000226, chr18_gl000207_random]
ERROR reference contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chr1_gl000191_random, chr1_gl000192_random, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7_gl000195_random, chr8_gl000196_random, chr8_gl000197_random, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chr11_gl000202_random, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18_gl000207_random, chr19_gl000208_random, chr19_gl000209_random, chr21_gl000210_random, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249]
ADD REPLYlink written 11 months ago by Tails60

There is a link that tells you how to fix this error in the error message you posted above.

Error details: The contig order in reads and reference is not the same; to fix this please see: (https://www.broadinstitute.org/gatk/guide/article?id=1328), which describes reordering contigs in BAM and VCF files..

ADD REPLYlink modified 11 months ago • written 11 months ago by GenoMax95k

Yes I checked that out and used picard's reordersam command. But unfortunately, I get another error, which made me suspect that I'm not using the correct fasta file. One comment I found said that someone else experiencing this problem should reconsider their fasta file. Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 3295191, Read name HSQ1004:134:C0D8DACXX:1:2107:17951:117669, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned

ADD REPLYlink written 11 months ago by Tails60

When we (RTG) processed that data several years ago to produce our own segregation phased variant call set, I recall there were some difficulties processing the BAM files. (We were converting the BAMs back to reads so we could process through our whole mapping/variant calling pipeline, and had to put some workarounds in place). I don't remember the specifics though, but it's not just you :-)

ADD REPLYlink written 11 months ago by Len Trigg1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1494 users visited in the last hour
_