PHG v0.0.23 (latest) LoadHaplotypesFromGVCFPlugin fails, but v0.0.22 with same setup not
3
0
Entering edit mode
8 months ago
dovi ▴ 60

Hi everyone,

I run the latest phg version (0.0.23) in order to be able to use the fixed "diploidPath". To have my database with all updates from this last version, I run all previous steps to imputation again using the latest version. However, the LoadHaplotypesFromGVCFPlugin step now fails, because of a unique constraint error. I have rerun the same code (with same config file, files and database) but instead phg:latest I used phg:0.0.22 and the code runs smoothly without any errors. I wonder whether it is a problem of some upgrades in the code for uploading the haplotypes or it is something that I do wrong or any new parameter that I am not aware of.

The code that I run is the following:

docker run --name upload_haplotypes --rm -v ${WORKING_DIR}:/phg/ -t maizegenetics/phg:latest /tassel-5-standalone/run_pipeline.pl -Xmx16G -debug -configParameters ${DOCKER_CONFIG_FILE} -LoadHaplotypesFromGVCFPlugin -bedFile /phg/genome_sorted.windows.bed -endPlugin

This code fails with the following error:

[DefaultDispatcher-worker-1] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - 
Staging S1D1 chrom 11 for DB uploading.
[DefaultDispatcher-worker-1] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - Time spent creating Sequences for  Chr:14 for Line: S1D1 : 0.015435339sec
[DefaultDispatcher-worker-1] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - Time spent creating GVCFSequence for  Chr:14 for Line: S1D1 : 1.08508E-4sec
[DefaultDispatcher-worker-1] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - Time spent creating Chr:14 for Line: S1D1 : 11.61738611sec
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putHaplotypesData: time to getVariantData : 15.218336869 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putVariantMappingData: total loaded to variant_mapping table: 3147
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putVariantMappingData: time to load variants : 0.490372271 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - loadVariantMappingHash: before loading hash, size of all variants in variants table=3147
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - loadVariantsHash query: select variant_id,chrom,position,ref_allele_id, alt_allele_id,anc_id from variants where chrom='10';
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - loadVariantsHash: size after loading 3147
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putVariantMappingData: time to loadVariantsHash at end: 0.036291337 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putHaplotypesData: time to process/load variants data: 0.52678073 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putHaplotypeData calling putHaploytpesForGamete
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - Begin putHaplotypesForGamete, number anchorSequences to load: 144
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putHaplotypes: starting to commit haplotypes
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putHaployptes - total count loaded to haplotypes table: 144
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putHaplotypesData: time to load haplotypes : 8.416172673 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putHaplotypesData: Finished batch, total processed = 144
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putMethod: added method GATK_PIPELINE_PATH to methods table
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all methods in method table=4
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - Paths added to db for S1D1, pathid=1
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - Done DBProcessing Line: S1D1 Chr: 10
-------------------------------
Current Heap Size: 1,232 MB
Max Available Heap: 14564 MB
-------------------------------
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess - db is setup, init prepared statements, load hash table

 beginning - isSqlite is true
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all geneotypes in genotype table=2
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - refRangeRefRangeIDMap is null, creating new one with size : 2411
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - loadAnchorHash: at end, size of refRangeRefRangeIDMap: 2411, number of rs.next processed: 2411
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all methods in method table=4
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all groups in gamete_groups table=2
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all gametes in gametes table=2
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading readMappingHash, size of all read_mappings in read_mapping table=0
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - DBProcessing Line: S1D1 Chr: 11
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - Figuring out gamete Group Id for S1D1 chr:11
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - Writing S1D1 chr:11 to the DB.
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putHaplotypesData: time to load allel and variants hash: 9.04E-7 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putHaplotypesData: time to addToMissingAlleleList: 2.1975E-5 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putHaplotypesData: calling putAlleleData with size 0
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putAlleleData: total loaded to alleles table: 0
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:loadAlleleHash: added string NONE to alleles table
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putHaplotypesData: time to process/load allele data: 0.004023821 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putHaplotypesData: second pass, getVariantData
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putHaplotypesData: time to getVariantData : 5.1769E-5 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putVariantMappingData: total loaded to variant_mapping table: 0
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putVariantMappingData: time to load variants : 1.397E-4 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - loadVariantMappingHash: before loading hash, size of all variants in variants table=3147
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - loadVariantsHash query: select variant_id,chrom,position,ref_allele_id, alt_allele_id,anc_id from variants where chrom='11';
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - loadVariantsHash: size after loading 0
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putVariantMappingData: time to loadVariantsHash at end: 2.9818E-4 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putHaplotypesData: time to process/load variants data: 6.21072E-4 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putHaplotypeData calling putHaploytpesForGamete
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - Begin putHaplotypesForGamete, number anchorSequences to load: 1
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putHaplotypes: starting to commit haplotypes
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putHaployptes - total count loaded to haplotypes table: 1
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putHaplotypesData: time to load haplotypes : 0.262941075 seconds
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putHaplotypesData: Finished batch, total processed = 1
SQLException 1
Code: 19
SqlState: null
Error Message: [SQLITE_CONSTRAINT]  Abort due to constraint violation (UNIQUE constraint failed: paths.genoid, paths.method_id)
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - Found Exception
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - S1D1 0 Chr: 11
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin - java.lang.IllegalStateException: PHGdbAccess:putPathsData: SQLException: failed when adding paths for method: GATK_PIPELINE_PATH, taxon: S1D1
[DefaultDispatcher-worker-3] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - Closing DB

It seems that the first chromosome '10' is loaded properly, but the next chromosome '11' fails.

If I run the following (v.0.0.22) I have no errors:

docker run --name upload_haplotypes --rm -v ${WORKING_DIR}:/phg/ -t maizegenetics/phg:0.0.22 /tassel-5-standalone/run_pipeline.pl -Xmx16G -debug -configParameters ${DOCKER_CONFIG_FILE} -LoadHaplotypesFromGVCFPlugin -bedFile /phg/genome_sorted.windows.bed -endPlugin

A second question is (in the case of being a bug in the new version): Does the fixed code in ImputePipelinePlugin depends on the updated code of LoadHaplotypesFromGVCFPlugin? Because a temporary alternative could be to run everything with the latest version except for the LoadHaplotypesFromGVCFPlugin, which I would use the 0.0.22. However I do not know whether is there any substantial change in LoadHaplotypesFromGVCFPlugin that might affect the ImputePipelinePlugin result.

Thank you.

phg tassel • 283 views
ADD COMMENT
2
Entering edit mode
8 months ago
lcj34 ▴ 80

Thanks for your post. We've identified an error in new code introduced that creates paths for newly created haplotypes. I am working on a fix. We'll post when a new version is available.

ADD COMMENT
2
Entering edit mode
8 months ago
lcj34 ▴ 80

There is a new PHG build out today that fixes the issue with loading the haplotypes. See dockerhub for maizegenetics/phg:0.0.24

ADD COMMENT
1
Entering edit mode
8 months ago
pjb39 ▴ 60

Looks like the first answer missed the second question at the bottom. The new code and bug affect loading haplotypes to the PHG db but do not affect the imputation pipeline. 0.0.23 does contain important improvements to the DiploidPathPlugin. So, you can use 0.0.23 for imputation but not for loading haplotypes (LoadHaplotypesFromGVCFPlugin, for example).

ADD COMMENT

Login before adding your answer.

Traffic: 1787 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6