Hi everyone, I am attempting to run the hgap4 pipeline from PacBio's SMRTtools (v10.2) for genome assembly, which uses the workflow management system Cromwell (pbcromwell
). However, I am very unfamiliar with workflow managers as a whole and would like some help to understand the errors I am getting so that I can discuss with my HPC administrator to try and fix it.
The hgap4 command takes in a XML file from PacBio continuous long-read data: pbcromwell run pb_hgap4 -e "${XML_INPUT}"
Based on the logs, the run appears to start out fine, calling the workflow as intended but is immediately followed by this info message:
[2024-09-02 03:34:11,54] [info] MaterializeWorkflowDescriptorActor [?[38;5;2m7643a414?[0m]: Call-to-Backend assignments: falcon.task__0_rawreads__tan_combine -> Local, falcon.task__2_asm_falcon -> Local, consensus.gather_gff -> Local, falcon.task__1_preads_ovl__build -> Local, falcon.task__0_rawreads__report -> Local, falcon.task__1_preads_ovl__daligner_scatter -> Local, coverage_reports.target_coverage -> Local, falcon.task__0_rawreads__tan_split -> Local, coverage_reports.gc_coverage_plot -> Local, falcon.task__0_rawreads__tan_apply -> Local, mapping.cleanup_chunked_dataset_files -> Local, consensus.split_alignments -> Local, falcon.task__1_preads_ovl__daligner_split -> Local, falcon.task__0_rawreads__daligner_las_merge -> Local, coverage_reports.plot_target_coverage -> Local, falcon.task__0_rawreads__tan_scatter -> Local, pb_hgap4.task_gen_config -> Local, pb_hgap4.fasta_to_reference -> Local, falcon.task__0_rawreads__build -> Local, mapping.mapping_stats -> Local, consensus.guess_optimal_max_nchunks -> Local, mapping.pbmm2_align -> Local, falcon.task__0_rawreads__daligner_apply -> Local, coverage_reports.plot_target_coverage -> Local, coverage_reports.pbreports_coverage -> Local, consensus.genomic_consensus -> Local, coverage_reports.target_coverage -> Local, pb_hgap4.update_subreads -> Local, coverage_reports.pbreports_coverage -> Local, pb_hgap4.task_get_dextas -> Local, coverage_reports.summarize_coverage -> Local, falcon.task__0_rawreads__daligner_split -> Local, falcon.task__1_preads_ovl__daligner_las_merge -> Local, mapping.gather_alignments -> Local, consensus.gather_vcf -> Local, falcon.task__1_preads_ovl__daligner_apply -> Local, coverage_reports.summarize_coverage -> Local, pb_hgap4.dataset_filter -> Local, get_input_sizes.get_ref_size -> Local, falcon.task__1_preads_ovl__db2falcon -> Local, mapping.split_reads -> Local, mapping.auto_consolidate_alignments -> Local, pb_hgap4.polished_assembly -> Local, falcon.task__0_rawreads__daligner_scatter -> Local, consensus.gather_fasta -> Local, get_input_sizes.get_bam_size -> Local, coverage_reports.gc_coverage_plot -> Local, falcon.task__0_rawreads__cns_apply -> Local, consensus.gather_fastq -> Local
[2024-09-02 03:34:11,67] [?[38;5;220mwarn?[0m] Local [?[38;5;2m7643a414?[0m]: Key/s [cpu] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2024-09-02 03:34:11,67] [?[38;5;220mwarn?[0m] Local [?[38;5;2m7643a414?[0m]: Key/s [cpu, memory] is/are not supported by backend. Unsupported attributes will not be part of job executions.
This then transitions into a 'failed state' for the workflow manager before exiting.
[2024-09-02 03:34:26,22] [info] WorkflowManagerActor WorkflowActor-7643a414-da35-4a34-ad27-4ee0cea26b85 is in a terminal state: WorkflowFailedState
(The log has a lot more messages, but I've picked out the pertinent ones that I believe is causing the error. If anyone would like to see the full log, I'd be happy to share it).
In any case, I presume that the error lies with the CPU/Memory message above. What does "CPU, memory" not supported by backend mean? I have tried to search on Google and found a couple of posts, but I don't quite understand what is being explained in those posts.
Examples:
https://github.com/broadinstitute/cromwell/issues/4413
https://hpc-discourse.usc.edu/t/how-to-configure-cromwell-backends-to-run-on-hpc/555/3
Am I supposed to run Docker with this? And do I require a connection? The SMRTtools that is installed on our HPC is offline.
I also found this in the Cromwell docs:
https://cromwell.readthedocs.io/en/stable/RuntimeAttributes/
Do I need to specify these attributes separately in the script I submit? Our HPC uses slurm to manage jobs, and I am calling this workflow through a bash script.
I understand that it is difficult to troubleshoot this based on the information available, but I would really appreciate if anyone could suggest what might have gone wrong and possibly point me in the right direction.
Thank you very much.