CreatePHG_step2_consensu
2
0
Entering edit mode
7 weeks ago
bp • 0

I'm trying to generate consensus haplotype for all reference ranges using a raw haplotypes generated from CreatePHG_step2_addHapsFromGVCFPipelinePlugin. I'm using WGS g.vcf files to generate the PHG so as the wiki page suggested, I did not provide a ranking file (as I don't have one) to get the following error;

 /tassel-5-standalone/lib/ahocorasick-0.2.4.jar:/tassel-5-standalone/lib/biojava-alignment-4.0.0.jar:/tassel-5-standalone/lib/biojava-core-4.0.0.jar:/tassel-5-standalone/lib/biojava-phylo-4.0.0.jar:/tassel-5-stan$Memory Settings: -Xms512m -Xmx10G Tassel Pipeline Arguments: -configParameters /phg/config_CreateConsensi.txt -debug -HaplotypeGraphBuilderPlugin -configFile /phg/config_CreateConsensi.txt -methods GATK_PIPELINE -includeVariantContexts true -end$
...
[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel Pipeline Arguments: [-fork1, -HaplotypeGraphBuilderPlugin, -configFile, /phg/config_CreateConsensi.txt, -methods, GATK_PIPELINE, -includeVariantCont$net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin [pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Jul 29, 2021 15:45:36 [pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - HaplotypeGraphBuilderPlugin Parameters configFile: /phg/config_CreateConsensi.txt methods: GATK_PIPELINE includeSequences: true includeVariantContexts: true haplotypeIds: null chromosomes: null taxa: [null] [pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - first connection: dbName from config file = /phg/95HRSW_attempt4.db host: localHost user: sqlite type: sqlite [pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Database URL: jdbc:sqlite:/phg/95HRSW_attempt4.db [pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Connected to database: [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: query statement: select reference_ranges.ref_range_id, chrom, range_start, range_end, methods.name from reference_r$
methods size: 1
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: number of reference ranges: 94229
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: time: 0.800579138 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: query statement: SELECT gamete_haplotypes.gamete_grp_id, genotypes.line_name FROM gamete_haplotypes INNER JOIN gametes ON ga$[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: number of taxa lists: 95 [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: time: 0.024714719 secs. [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.VariantUtils - variantIdsToVariantMap: query statement: SELECT variant_id, chrom, position, ref_allele_id, alt_allele_id FROM variants; [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: haplotype method: GATK_PIPELINE range group method: null [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: query statement: SELECT haplotypes_id, gamete_grp_id, haplotypes.ref_range_id, asm_contig, asm_start_coordinate, as$
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: time: 562.852142849 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.HaplotypeGraph - Created graph edges: created when requested  number of nodes: 0  number of reference ranges: 0
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Jul 29, 2021 15:55:1
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin: time: Jul 29, 2021 15:55:1
RunHapConsensusPipelinePlugin Parameters
referenceFasta: /phg/inputDir/reference/Ref.fa
dbConfigFile: /phg/config_CreateConsensi.txt
collapseMethod: CONSENSUStrue
collapseMethodDetails: CONSENSUStrue for creating Consensus
minFreq: 0.5
rankingFile: null
maxClusters: 30
minSites: 30
minCoverage: 0.1
minTaxa: 1
mxDiv: 0.01
clusteringMode: upgma
kmerSize: 7
distanceCalculation: Euclidean
[pool-1-thread-1] ERROR net.maizegenetics.plugindef.AbstractPlugin - -rankingFile: null doesn't exist


Shouldn't it not require a ranking file ?

PHGwiki PHG • 244 views
0
Entering edit mode

I made up an arbitrary ranking file with all taxa ranked 1 and continued with creating a consensus haplotype.

     ...
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: query statement: select reference_ranges.ref_range_id, chrom, range_start, range_end, methods.name from reference_r$methods size: 1 [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: number of reference ranges: 94229 [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: time: 0.564907808 secs. [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: query statement: SELECT gamete_haplotypes.gamete_grp_id, genotypes.line_name FROM gamete_haplotypes INNER JOIN gametes ON ga$
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: number of taxa lists: 95
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: time: 0.003135835 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.VariantUtils - variantIdsToVariantMap: query statement: SELECT variant_id, chrom, position, ref_allele_id, alt_allele_id FROM variants;
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: haplotype method: GATK_PIPELINE range group method: null
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: query statement: SELECT haplotypes_id, gamete_grp_id, haplotypes.ref_range_id, asm_contig, asm_start_coordinate, as$[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - addNodes: number of nodes: 0 [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - addNodes: number of reference ranges: 0 [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: time: 144.607289838 secs. [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.HaplotypeGraph - Created graph edges: created when requested number of nodes: 0 number of reference ranges: 0 [pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Jul 29, 2021 13:37:20 [pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin: time: Jul 29, 2021 13:37:20 [pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - RunHapConsensusPipelinePlugin Parameters referenceFasta: /phg/inputDir/reference/Ref.fa dbConfigFile: /phg/config_CreateConsensi.txt collapseMethod: CONSENSUStrue collapseMethodDetails: CONSENSUStrue for creating Consensus minFreq: 0.5 rankingFile: /phg/RankingFile maxClusters: 30 minSites: 30 minCoverage: 0.1 maxThreads: 1000 minTaxa: 1 mxDiv: 0.01 clusteringMode: upgma kmerSize: 7 distanceCalculation: Euclidean Genome FASTA character conversion: ACGTNacgtn to ACGTNacgtn [pool-1-thread-1] INFO net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin - RunHapConsensusPipelinePlugin: checking masterVariantMap for empty [pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - first connection: dbName from config file = /phg/95HRSW_attempt4.db host: localHost user: sqlite type: sqlite [pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Database URL: jdbc:sqlite:/phg/95HRSW_attempt4.db [pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Connected to database: [pool-1-thread-1] INFO net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin - RunHapConsensusPipelinePlugin: masterVariantMap is empty - call addGraphVariantsToVariantMap [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.VariantUtils - getAllVariantIds: size of varidSet to return: 0 [pool-1-thread-1] INFO net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin - time to create masterVariantMap in seconds: 0.021050659 [pool-1-thread-1] INFO net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin - runHapConsensusPipelinePlugin: after masterVariantMap check, size of masterVariantMap: 0 [pool-1-thread-1] INFO net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin - Starting up the threadpool [pool-1-thread-1] INFO net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin - Thread Pool started [pool-1-thread-1] INFO net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin - Loading up the ranking file [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - first connection: dbName from config file = /phg/95HRSW_attempt4.db host: localHost user: sqlite type: sqlite [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Database URL: jdbc:sqlite:/phg/95HRSW_attempt4.db [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Connected to database: [pool-1-thread-1] WARN net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin - WARNING found multiple taxon with the same ranking. This has the potential to select incorrect representative haplo$
[pool-1-thread-1] INFO net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin - Walking through reference ranges and starting up future threads
beginning - isSqlite is true
[ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - refRangeRefRangeIDMap is null, creating new one with size : 94229
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin: time: Jul 29, 2021 13:38:59
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin: time: Jul 29, 2021 13:38:59: progress: 100%
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin  Citation: Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buck\$
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Jul 29, 2021 13:38:59: progress: 100%


It hasn't run the mergeGVCFplugin,FindHaplotypeClustersPlugin and LoadConsensusAnchorSequencesPlugin and yet says the job is complete. How would I be able to fix it?

0
Entering edit mode
7 weeks ago
pjb39 ▴ 60

The answer to the question about the ranking file is that the ranking file is not required. But you apparently tried rankingFile=null in the config file. As a result, the program is trying to find a file named "null" and telling you that it does not exist. Instead, when not assigning a rankingFile you need to delete that line from the config file. If you set any other parameters to null in the config file that could cause problems as well. Again just delete optional parameters that you do not need to set. For the second part, creating a rankingFile the way you did worked, so the issue is something different. I would need to see the entire log file so I can see what commands are being run and more about the results to be able to diagnose. Since biostars has a limit to how much you can attach to a message, feel free to email me the entire log at pjb39@cornell.edu.

0
Entering edit mode
7 weeks ago
pjb39 ▴ 60

After looking at the log file, I can see that you are running the RunHapConsensusPipelinePlugin, which I could see from your posting here. I was not sure if you were running other Plugins before that. That plugin does not populate the database with haplotypes but only creates consensus haplotypes from existing ones. To populate the DB, look at the instructions in the PHG Wiki on the add haplotypes page (https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/UserInstructions/CreatePHG_step1-2_main.md). At the bottom of the page is a description of the PopulatePHGDBPipelinePlugin. That handles the entire process of creating haplotypes, loading them to the database and finding consensus haplotypes. If you need more control or only need to run specific steps then the same page has descriptions of the individual steps. For instance, if you already have GVCF files, then just run step D before running the consensus plugin.