Question

Practical Haplotype Graph

1

Entering edit mode

8 weeks ago

yifangt86 ▴ 60

Hello,

I had an error to create consensus for practical haplotype graph (PHG) with Docker. The database seems successfully created in my local directory by its size (~694MB) with following scripts:

$ docker run --name cbsu_phg_container --rm \
 -v ${project_dir}/outputDir/:/tempFileDir/outputDir/  \
 -v ${project_dir}/reference/:/tempFileDir/data/reference/  \
 -v ${project_dir}/inputDir/:/tempFileDir/data/   \
 -v ${project_dir}/inputDir/:/tempFileDir/answer/   \
 -t maizegenetics/phg:0.0.18   \
/LoadGenomeIntervals.sh ${config_file} ${reference} ${reference_ranges} ${genome_data} ${create_db}

To generate the consensus with following command line:

$ docker run --name cbsu_phg_container_consensus --rm    \
-v ${project_dir}/inputDir/reference/:/tempFileDir/data/reference/  \
-v ${project_dir}/outputDir/${DB}:/tempFileDir/outputDir/${DB}  \
-v ${project_dir}/inputDir/${config_file}:/tempFileDir/data/config.txt  \
-t maizegenetics/phg:0.0.18    \
/CreateConsensi.sh /tempFileDir/data/${config_file} ${reference} ${add_haplotypes_method} ${consensus_method}

then got the following log with error message: ERROR .... Problem getting id for method: GATK_PIPELINE

Here is the full log from the stdout:

/CreateConsensi.sh: line 44: [: !=: unary operator expected
/CreateConsensi.sh: line 53: [: !=: unary operator expected
/tassel-5-standalone/lib/biojava-phylo-4.0.0.jar:/tassel-5-standalone/lib/javax.json-1.0.4.jar:/tassel-5-standalone/lib/scala-library-2.10.1.jar:/tassel-5-standalone/lib/log4j-1.2.13.jar:/tassel-5-standalone/lib/biojava-core-4.0.0.jar:/tassel-5-standalone/lib/jhdf5-14.12.5.jar:/tassel-5-standalone/lib/slf4j-simple-1.7.10.jar:/tassel-5-standalone/lib/sqlite-jdbc-3.8.5-pre1.jar:/tassel-5-standalone/lib/guava-22.0.jar:/tassel-5-standalone/lib/kotlinx-coroutines-core-1.3.0.jar:/tassel-5-standalone/lib/ejml-0.23.jar:/tassel-5-standalone/lib/trove-3.0.3.jar:/tassel-5-standalone/lib/ahocorasick-0.2.4.jar:/tassel-5-standalone/lib/forester-1.038.jar:/tassel-5-standalone/lib/jfreesvg-3.2.jar:/tassel-5-standalone/lib/commons-codec-1.10.jar:/tassel-5-standalone/lib/htsjdk-2.19.0.jar:/tassel-5-standalone/lib/phg.jar:/tassel-5-standalone/lib/postgresql-9.4-1201.jdbc41.jar:/tassel-5-standalone/lib/gs-core-1.3.jar:/tassel-5-standalone/lib/colt-1.2.0.jar:/tassel-5-standalone/lib/jcommon-1.0.23.jar:/tassel-5-standalone/lib/slf4j-api-1.7.10.jar:/tassel-5-standalone/lib/jfreechart-1.0.19.jar:/tassel-5-standalone/lib/commons-math3-3.4.1.jar:/tassel-5-standalone/lib/itextpdf-5.1.0.jar:/tassel-5-standalone/lib/gs-ui-1.3.jar:/tassel-5-standalone/lib/mail-1.4.jar:/tassel-5-standalone/lib/fastutil-8.2.2.jar:/tassel-5-standalone/lib/snappy-java-1.1.1.6.jar:/tassel-5-standalone/lib/junit-4.10.jar:/tassel-5-standalone/lib/biojava-alignment-4.0.0.jar:/tassel-5-standalone/lib/kotlin-stdlib-1.3.50.jar:/tassel-5-standalone/lib/json-simple-1.1.1.jar:/tassel-5-standalone/sTASSEL.jar
Memory Settings: -Xms512m -Xmx16G
Tassel Pipeline Arguments: -debug -HaplotypeGraphBuilderPlugin -configFile /tempFileDir/data/config.txt -methods GATK_PIPELINE -includeVariantContexts true -endPlugin -RunHapConsensusPipelinePlugin -ref /tempFileDir/data/reference/Pisum_sativum_v1a.fa -dbConfigFile /tempFileDir/data/config.txt -collapseMethod CONSENSUS_mxDiv00025 -collapseMethodDetails "CONSENSUS_mxDiv00025 for creating Consensus" -minFreq 0.5 -endPlugin
[main] INFO net.maizegenetics.tassel.TasselLogging - Tassel Version: 5.2.61  Date: May 7, 2020
[main] INFO net.maizegenetics.tassel.TasselLogging - Max Available Memory Reported by JVM: 14564 MB
[main] INFO net.maizegenetics.tassel.TasselLogging - Java Version: 1.8.0_212
[main] INFO net.maizegenetics.tassel.TasselLogging - OS: Linux
[main] INFO net.maizegenetics.tassel.TasselLogging - Number of Processors: 2
[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel Pipeline Arguments: [-fork1, -HaplotypeGraphBuilderPlugin, -configFile, /tempFileDir/data/config.txt, -methods, GATK_PIPELINE, -includeVariantContexts, true, -endPlugin, -RunHapConsensusPipelinePlugin, -ref, /tempFileDir/data/reference/Pisum_sativum_v1a.fa, -dbConfigFile, /tempFileDir/data/config.txt, -collapseMethod, CONSENSUS_mxDiv00025, -collapseMethodDetails, CONSENSUS_mxDiv00025 for creating Consensus, -minFreq, 0.5, -endPlugin, -runfork1]
net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin
   net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Feb 27, 2024 18:04:31
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - 
HaplotypeGraphBuilderPlugin Parameters
configFile: /tempFileDir/data/config.txt
methods: GATK_PIPELINE
includeSequences: true
includeVariantContexts: true
haplotypeIds: null
chromosomes: null

[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - first connection: dbName from config file = /tempFileDir/outputDir/PeaTrial_PHG_23-02-2024_509taxa.db host: localHost user: sqlite type: sqlite
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Database URL: jdbc:sqlite:/tempFileDir/outputDir/PeaTrial_PHG_23-02-2024_509taxa.db
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Connected to database:  

[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: query statement: select reference_ranges.ref_range_id, chrom, range_start, range_end, methods.name from reference_ranges  INNER JOIN ref_range_ref_range_method on ref_range_ref_range_method.ref_range_id=reference_ranges.ref_range_id  INNER JOIN methods on ref_range_ref_range_method.method_id = methods.method_id  AND methods.method_type = 7 ORDER BY reference_ranges.ref_range_id
methods size: 1
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: number of reference ranges: 2007
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: time: 0.09040627 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: query statement: SELECT gamete_haplotypes.gamete_grp_id, genotypes.line_name FROM gamete_haplotypes INNER JOIN gametes ON gamete_haplotypes.gameteid = gametes.gameteid INNER JOIN genotypes on gametes.genoid = genotypes.genoid ORDER BY gamete_haplotypes.gamete_grp_id;
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: number of taxa lists: 1
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: time: 0.00672731 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.VariantUtils - variantIdsToVariantMap: query statement: SELECT variant_id, chrom, position, ref_allele_id, alt_allele_id FROM variants;
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: haplotype method: GATK_PIPELINE range group method: null
[pool-1-thread-1] DEBUG net.maizegenetics.pangenome.api.CreateGraphUtils - CreateGraphUtils: methodId: no method name GATK_PIPELINE
java.lang.IllegalArgumentException: CreateGraphUtils: methodId: no method name GATK_PIPELINE
    at net.maizegenetics.pangenome.api.CreateGraphUtils.methodId(CreateGraphUtils.java:1103)
    at net.maizegenetics.pangenome.api.CreateGraphUtils.createHaplotypeNodes(CreateGraphUtils.java:397)
    at net.maizegenetics.pangenome.api.CreateGraphUtils.createHaplotypeNodes(CreateGraphUtils.java:871)
    at net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin.processData(HaplotypeGraphBuilderPlugin.java:62)
    at net.maizegenetics.plugindef.AbstractPlugin.performFunction(AbstractPlugin.java:118)
    at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:1970)
    at net.maizegenetics.plugindef.ThreadedPluginListener.run(ThreadedPluginListener.java:29)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
[pool-1-thread-1] DEBUG net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin - CreateGraphUtils: methodId: Problem getting id for method: GATK_PIPELINE
CreateGraphUtils: methodId: no method name GATK_PIPELINE
java.lang.IllegalArgumentException: CreateGraphUtils: methodId: Problem getting id for method: GATK_PIPELINE
CreateGraphUtils: methodId: no method name GATK_PIPELINE
    at net.maizegenetics.pangenome.api.CreateGraphUtils.methodId(CreateGraphUtils.java:1112)
    at net.maizegenetics.pangenome.api.CreateGraphUtils.createHaplotypeNodes(CreateGraphUtils.java:397)
    at net.maizegenetics.pangenome.api.CreateGraphUtils.createHaplotypeNodes(CreateGraphUtils.java:871)
    at net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin.processData(HaplotypeGraphBuilderPlugin.java:62)
    at net.maizegenetics.plugindef.AbstractPlugin.performFunction(AbstractPlugin.java:118)
    at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:1970)
    at net.maizegenetics.plugindef.ThreadedPluginListener.run(ThreadedPluginListener.java:29)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
[pool-1-thread-1] DEBUG net.maizegenetics.plugindef.AbstractPlugin - processData: Problem creating graph: CreateGraphUtils: methodId: Problem getting id for method: GATK_PIPELINE
CreateGraphUtils: methodId: no method name GATK_PIPELINE
java.lang.IllegalStateException: processData: Problem creating graph: CreateGraphUtils: methodId: Problem getting id for method: GATK_PIPELINE
CreateGraphUtils: methodId: no method name GATK_PIPELINE
    at net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin.processData(HaplotypeGraphBuilderPlugin.java:67)
    at net.maizegenetics.plugindef.AbstractPlugin.performFunction(AbstractPlugin.java:118)
    at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:1970)
    at net.maizegenetics.plugindef.ThreadedPluginListener.run(ThreadedPluginListener.java:29)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - 
Usage:
HaplotypeGraphBuilderPlugin <options>
-configFile <Config File> : Database configuration file (required)
-methods <Methods> : Pairs of methods (haplotype method name and range group method name). Method pair separated by a comma, and pairs separated by colon. The range group is optional 
Usage: <haplotype method name1>,<range group name1>:<haplotype method name2>,<range group name2>:<haplotype method name3> (required)
-includeSequences <true | false> : Whether to include sequences in haplotype nodes. (Default: true)
-includeVariantContexts <true | false> : Whether to include variant contexts in haplotype nodes. (Default: false)
-haplotypeIds <Haplotype Ids> : List of haplotype ids to include in the graph. If not specified, all ids are included.
-chromosomes <Chromosomes> : List of chromosomes to include in graph.  Default is to include all chromosomes.  (i.e. -chromosomes 1,3)

[pool-1-thread-1] ERROR net.maizegenetics.plugindef.AbstractPlugin - processData: Problem creating graph: CreateGraphUtils: methodId: Problem getting id for method: GATK_PIPELINE
CreateGraphUtils: methodId: no method name GATK_PIPELINE

I searched the wiki and other forums, this error seems nobody met before. My setting or the option could be wrong somewhere, but not sure. The method part (with some guess) in the config file is contained as:

.......(sqlite part omiited) 
#Consensus method
consensusMethodName=CONSENSUS
inputConsensusMethods=GATK_PIPELINE

numThreads=2
...... (other part skipped)

Appreciate any input. Thanks in advance.

methodID PHG • 451 views

ADD COMMENT • link updated 8 weeks ago by lcj34 ▴ 420 • written 8 weeks ago by yifangt86 ▴ 60

score 1 · Answer 1 · 2024-02-28

1

Entering edit mode

8 weeks ago

lcj34 ▴ 420

The error you are seeing indicates you have not loaded haplotypes with a method named GATK_PIPELINE. Did you run any steps other than what you show above? I cannot see your config files, so do not know what methods you have used.

The LoadGenomeIntervals.sh script loads the reference ranges, but does not load assembly data. Loading assemblies requires running the genome alignment step (aligning genomes to the reference) and storing the created haplotypes. You may do this via the AssemblyMAFFromAnchorWavePlugin plugin. Note the docker script LoadAssemblyAnchors.sh script is outdated and no longer works.

Alternatively, you can load haplotypes directly from gvcfs created by other means.

If aligning assemblies with anchorwave you need to transform the anchorwave created MAF files into GFF files via the MAFToGVCFPlugin, then load the resulting GFF files by calling LoadHaplotypesFromGVCFPlugin.

please see this link for additional information on loading haplotypes to your database. https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/UserInstructions/CreatePHG_step2AssemblyAndWGSHaplotypes.md

ADD COMMENT • link 8 weeks ago by lcj34 ▴ 420

0

Entering edit mode

Thanks! I am following the protocol by Pradeep Ruperao. The error came from the second step posted (i.e. CreateConsensi.sh with docker), Also, in the protocol the example config files are buried in different links pointing to the bitbucket wiki.

I must have missed the step to load the haplotypes to the database, as you pointed out. I did not run any other steps yet other than the two steps posted, which is the part I was trying to sort out.

Some background information for my analysis I should have posted first: I have ~500 different taxa (12 samples for test so far), their fastqs have been mapped to the reference fasta, and the corresponding gVCFs have been created. The reference regions to anchor the intervals are the BED file for genes of interest. The BED file was derived from the GFF with custom scripts.

Because I already have the gVCF files of different taxa, how should I proceed in the scenario without the assemblies? Or, what is the best practice for this scenario (reference + interval.bed + gVCFs) to create PHG? Sorry for my novice question. Thanks a lot!

ADD REPLY • link 8 weeks ago by yifangt86 ▴ 60

0

Entering edit mode

If you have gvcf files you may load them via the LoadHaplotypesFromGVCFPlugin. I would recommend using the latest version which stores variants in external gvcf files. This results in much faster processing time than the original code and fixes some other issues. To use the gvcf files, you will need to create tabix indices for them. Please refer to the documentation here:

https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/Home_variantsInGVCFFiles ,

in particular the instructions for loading the GVCFS is here:

https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/UserInstructions/CreatePHG_step2_LoadHaplotypesFromGVCFPluginDetails.md

ADD REPLY • link 8 weeks ago by lcj34 ▴ 420