Question

CreatePHG_step2_consensu

0

Entering edit mode

2.7 years ago

bp • 0

I'm trying to generate consensus haplotype for all reference ranges using a raw haplotypes generated from CreatePHG_step2_addHapsFromGVCFPipelinePlugin. I'm using WGS g.vcf files to generate the PHG so as the wiki page suggested, I did not provide a ranking file (as I don't have one) to get the following error;

 /tassel-5-standalone/lib/ahocorasick-0.2.4.jar:/tassel-5-standalone/lib/biojava-alignment-4.0.0.jar:/tassel-5-standalone/lib/biojava-core-4.0.0.jar:/tassel-5-standalone/lib/biojava-phylo-4.0.0.jar:/tassel-5-stan$
    Memory Settings: -Xms512m -Xmx10G
    Tassel Pipeline Arguments: -configParameters /phg/config_CreateConsensi.txt -debug -HaplotypeGraphBuilderPlugin -configFile /phg/config_CreateConsensi.txt -methods GATK_PIPELINE -includeVariantContexts true -end$
    [main] INFO net.maizegenetics.plugindef.ParameterCache - load: loading parameter cache with: /phg/config_CreateConsensi.txt
...    
[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel Pipeline Arguments: [-fork1, -HaplotypeGraphBuilderPlugin, -configFile, /phg/config_CreateConsensi.txt, -methods, GATK_PIPELINE, -includeVariantCont$
net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin
   net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Jul 29, 2021 15:45:36
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin -
HaplotypeGraphBuilderPlugin Parameters
configFile: /phg/config_CreateConsensi.txt
methods: GATK_PIPELINE
includeSequences: true
includeVariantContexts: true
haplotypeIds: null
chromosomes: null
taxa: [null]

[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - first connection: dbName from config file = /phg/95HRSW_attempt4.db host: localHost user: sqlite type: sqlite
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Database URL: jdbc:sqlite:/phg/95HRSW_attempt4.db
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Connected to database:

[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: query statement: select reference_ranges.ref_range_id, chrom, range_start, range_end, methods.name from reference_r$
methods size: 1
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: number of reference ranges: 94229
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: time: 0.800579138 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: query statement: SELECT gamete_haplotypes.gamete_grp_id, genotypes.line_name FROM gamete_haplotypes INNER JOIN gametes ON ga$
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: number of taxa lists: 95
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: time: 0.024714719 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.VariantUtils - variantIdsToVariantMap: query statement: SELECT variant_id, chrom, position, ref_allele_id, alt_allele_id FROM variants;
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: haplotype method: GATK_PIPELINE range group method: null
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: query statement: SELECT haplotypes_id, gamete_grp_id, haplotypes.ref_range_id, asm_contig, asm_start_coordinate, as$
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - addNodes: number of nodes: 0
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - addNodes: number of reference ranges: 0
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: time: 562.852142849 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.HaplotypeGraph - Created graph edges: created when requested  number of nodes: 0  number of reference ranges: 0
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Jul 29, 2021 15:55:1
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin: time: Jul 29, 2021 15:55:1
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin -
RunHapConsensusPipelinePlugin Parameters
referenceFasta: /phg/inputDir/reference/Ref.fa
dbConfigFile: /phg/config_CreateConsensi.txt
collapseMethod: CONSENSUStrue
collapseMethodDetails: CONSENSUStrue for creating Consensus
minFreq: 0.5
rankingFile: null
maxClusters: 30
minSites: 30
minCoverage: 0.1
maxThreads: 1000
minTaxa: 1
mxDiv: 0.01
clusteringMode: upgma
kmerSize: 7
distanceCalculation: Euclidean
[pool-1-thread-1] ERROR net.maizegenetics.plugindef.AbstractPlugin - -rankingFile: null doesn't exist

Shouldn't it not require a ranking file ?

PHGwiki PHG • 947 views

ADD COMMENT • link updated 2.7 years ago by pjb39 ▴ 200 • written 2.7 years ago by bp • 0

0

Entering edit mode

I made up an arbitrary ranking file with all taxa ranked 1 and continued with creating a consensus haplotype.

     ...
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: query statement: select reference_ranges.ref_range_id, chrom, range_start, range_end, methods.name from reference_r$
    methods size: 1
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: number of reference ranges: 94229
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: time: 0.564907808 secs.
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: query statement: SELECT gamete_haplotypes.gamete_grp_id, genotypes.line_name FROM gamete_haplotypes INNER JOIN gametes ON ga$
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: number of taxa lists: 95
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: time: 0.003135835 secs.
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.VariantUtils - variantIdsToVariantMap: query statement: SELECT variant_id, chrom, position, ref_allele_id, alt_allele_id FROM variants;
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: haplotype method: GATK_PIPELINE range group method: null
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: query statement: SELECT haplotypes_id, gamete_grp_id, haplotypes.ref_range_id, asm_contig, asm_start_coordinate, as$
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - addNodes: number of nodes: 0
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - addNodes: number of reference ranges: 0
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: time: 144.607289838 secs.
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.HaplotypeGraph - Created graph edges: created when requested  number of nodes: 0  number of reference ranges: 0
    [pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Jul 29, 2021 13:37:20
    [pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin: time: Jul 29, 2021 13:37:20
    [pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin -
    RunHapConsensusPipelinePlugin Parameters
    referenceFasta: /phg/inputDir/reference/Ref.fa
    dbConfigFile: /phg/config_CreateConsensi.txt
    collapseMethod: CONSENSUStrue
    collapseMethodDetails: CONSENSUStrue for creating Consensus
    minFreq: 0.5
    rankingFile: /phg/RankingFile
    maxClusters: 30
    minSites: 30
    minCoverage: 0.1
    maxThreads: 1000
    minTaxa: 1
    mxDiv: 0.01
    clusteringMode: upgma
    kmerSize: 7
    distanceCalculation: Euclidean
    Genome FASTA character conversion: ACGTNacgtn to ACGTNacgtn
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin - RunHapConsensusPipelinePlugin: checking masterVariantMap for empty
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - first connection: dbName from config file = /phg/95HRSW_attempt4.db host: localHost user: sqlite type: sqlite
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Database URL: jdbc:sqlite:/phg/95HRSW_attempt4.db
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Connected to database:

    [pool-1-thread-1] INFO net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin - RunHapConsensusPipelinePlugin: masterVariantMap is empty - call addGraphVariantsToVariantMap
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.api.VariantUtils - getAllVariantIds: size of varidSet to return: 0
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin - time to create masterVariantMap in seconds: 0.021050659
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin -
    runHapConsensusPipelinePlugin: after masterVariantMap check, size of masterVariantMap: 0
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin - Starting up the threadpool
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin - Thread Pool started
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin - Loading up the ranking file
    [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - first connection: dbName from config file = /phg/95HRSW_attempt4.db host: localHost user: sqlite type: sqlite
    [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Database URL: jdbc:sqlite:/phg/95HRSW_attempt4.db
    [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Connected to database:

    [pool-1-thread-1] WARN net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin - WARNING found multiple taxon with the same ranking.  This has the potential to select incorrect representative haplo$
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin - Walking through reference ranges and starting up future threads
    [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess - db is setup, init prepared statements, load hash table
    [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess -
     beginning - isSqlite is true
    [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all geneotypes in genotype table=95
    [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - refRangeRefRangeIDMap is null, creating new one with size : 94229
    [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - loadAnchorHash: at end, size of refRangeRefRangeIDMap: 94229, number of rs.next processed: 94229
    [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all methods in method table=5
    [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all groups in taxa_groups table=0
    [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all groups in gamete_groups table=95
    [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all gametes in gametes table=95
    [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading readMappingHash, size of all read_mappings in read_mapping table=0
    [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - PHGdbAccess:putMethod: added method CONSENSUStrue to methods table
    [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all methods in method table=6
    [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putGameteGroupList: total loaded to gamete_groups: 0
    [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - before loading hash, size of all groups in gamete_groups table=95
    [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putGameteHaplotypesFromList: committed 0 to gamete_haplotypes table
    [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - Begin putHaplotypesForMultipleGroups, number interval sequences to load: 0
    [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - putHaployptes - total count loaded to haplotypes table: 0
    [ForkJoinPool.commonPool-worker-57] INFO net.maizegenetics.pangenome.db_loading.PHGdbAccess - Finished processing putConsensusSeuqnces in 0.016059484 seconds
    [pool-1-thread-1] INFO net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin - Shutting down the threadpool
    [pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin: time: Jul 29, 2021 13:38:59
    [pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin: time: Jul 29, 2021 13:38:59: progress: 100%
    [pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - net.maizegenetics.pangenome.hapcollapse.RunHapConsensusPipelinePlugin  Citation: Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buck$
    [pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Jul 29, 2021 13:38:59: progress: 100%

It hasn't run the mergeGVCFplugin,FindHaplotypeClustersPlugin and LoadConsensusAnchorSequencesPlugin and yet says the job is complete. How would I be able to fix it?

ADD REPLY • link 2.7 years ago by bp • 0

score 0 · Answer 1 · 2021-07-29

The answer to the question about the ranking file is that the ranking file is not required. But you apparently tried rankingFile=null in the config file. As a result, the program is trying to find a file named "null" and telling you that it does not exist. Instead, when not assigning a rankingFile you need to delete that line from the config file. If you set any other parameters to null in the config file that could cause problems as well. Again just delete optional parameters that you do not need to set. For the second part, creating a rankingFile the way you did worked, so the issue is something different. I would need to see the entire log file so I can see what commands are being run and more about the results to be able to diagnose. Since biostars has a limit to how much you can attach to a message, feel free to email me the entire log at pjb39@cornell.edu.

score 0 · Answer 2 · 2021-07-29

After looking at the log file, I can see that you are running the RunHapConsensusPipelinePlugin, which I could see from your posting here. I was not sure if you were running other Plugins before that. That plugin does not populate the database with haplotypes but only creates consensus haplotypes from existing ones. To populate the DB, look at the instructions in the PHG Wiki on the add haplotypes page (https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/UserInstructions/CreatePHG_step1-2_main.md). At the bottom of the page is a description of the PopulatePHGDBPipelinePlugin. That handles the entire process of creating haplotypes, loading them to the database and finding consensus haplotypes. If you need more control or only need to run specific steps then the same page has descriptions of the individual steps. For instance, if you already have GVCF files, then just run step D before running the consensus plugin.