Hi,
I am trying to setup PHG. I am using Singularity as I am working on a cluster.
The installation sis pretty simple as described on : https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/UserInstructions/CreatePHG_step0_singularity.md
So I ran:
$ cd phg_singularity
$ module load singularity
$ singularity pull docker://maizegenetics/phg
$ singularity pull docker://maizegenetics/phg_liquibase $singularity
$ build phg_22.simg docker://maizegenetics/phg:0.0.22
And followed up with building the default data structure:
$ cd ../phg_run
$ singularity exec -B /absolute/path/phg_singularity/:/phg/ /absolute/path/phg_singularity/phg_22.simg /tassel-5-standalone/run_pipeline.pl -debug -Xmx1G -MakeDefaultDirectoryPlugin -workingDir /phg/ -endPlugin
This ran without troubles. I followed with copying files into the proper directories and filled in the phg_run/inputDir/load_genome_data.txt and other key files.
The next step would be the setup of the initial PHG database: https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/UserInstructions/MakeInitialPHGDBPipeline.md
Sadly, all steps from here on are only documented for the Docker installation of PHG
I changed:
$ WORKING_DIR=local/directory/where/MakeDefaultDirectory/was/run/
$ DOCKER_CONFIG_FILE=/phg/config.txt
$ docker run --name create_initial_db --rm \
-v ${WORKING_DIR}/:/phg/ \
-t maizegenetics/phg:latest \
/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -configParameters ${DOCKER_CONFIG_FILE} \
-MakeInitialPHGDBPipelinePlugin -endPlugin
Into the Singularity version: (probably not quite right, but is seems to run somehow)
$ cd ../phg_run
$ DOCKER_CONFIG_FILE=config.txt
$ singularity exec \
-B /absolute/path/phg_singularity/:/phg/ /absolute/path/phg_singularity/phg_22.simg \
/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -configParameters ${DOCKER_CONFIG_FILE} \
-MakeInitialPHGDBPipelinePlugin -endPlugin
While this appears to run, it crashes with UNASSIGNED values of the config.txt file
An example of config.txt is given on: https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/UserInstructions/MakeInitialPHGDBPipeline.md
Yet the example file:
host=localHost
user=sqlite
password=sqlite
DB=/phg/phg_db_name.db
DBtype=sqlite
# Load genome intervals parameters
referenceFasta=/phg/inputDir/reference/Ref.fa
anchors=/phg/anchors.bed
genomeData=/phg/inputDir/reference/load_genome_data.txt
refServerPath=irods:/ibl/home/assemblies/
#liquibase results output directory, general output directory
outputDir=/phg/outputDir
liquibaseOutdir=/phg/outputDir
looks very different from the config file which was created in my file system:
########################################
#Required Parameters:
########################################
HaplotypeGraphBuilderPlugin.methods=**UNASSIGNED**
HaplotypeGraphBuilderPlugin.configFile=**UNASSIGNED**
CreateIntervalBedFilesPlugin.dbConfigFile=**UNASSIGNED**
CreateIntervalBedFilesPlugin.refRangeMethods=**UNASSIGNED**
GetDBConnectionPlugin.create=**UNASSIGNED**
GetDBConnectionPlugin.config=**UNASSIGNED**
LoadAllIntervalsToPHGdbPlugin.genomeData=**UNASSIGNED**
LoadAllIntervalsToPHGdbPlugin.outputDir=**UNASSIGNED**
LoadAllIntervalsToPHGdbPlugin.ref=**UNASSIGNED**
LoadAllIntervalsToPHGdbPlugin.anchors=**UNASSIGNED**
LoadHaplotypesFromGVCFPlugin.wgsKeyFile=**UNASSIGNED**
LoadHaplotypesFromGVCFPlugin.bedFile=**UNASSIGNED**
LoadHaplotypesFromGVCFPlugin.haplotypeMethodName=**UNASSIGNED**
LoadHaplotypesFromGVCFPlugin.gvcfDir=**UNASSIGNED**
LoadHaplotypesFromGVCFPlugin.referenceFasta=**UNASSIGNED**
FilterGVCFSingleFilePlugin.inputGVCFFile=**UNASSIGNED**
FilterGVCFSingleFilePlugin.outputGVCFFile=**UNASSIGNED**
FilterGVCFSingleFilePlugin.configFile=**UNASSIGNED**
RunHapConsensusPipelinePlugin.collapseMethod=**UNASSIGNED**
RunHapConsensusPipelinePlugin.dbConfigFile=**UNASSIGNED**
AssemblyHaplotypesMultiThreadPlugin.outputDir=**UNASSIGNED**
AssemblyHaplotypesMultiThreadPlugin.keyFile=**UNASSIGNED**
referenceFasta=**UNASSIGNED**
########################################
#Defaulted parameters:
########################################
HaplotypeGraphBuilderPlugin.includeSequences=true
HaplotypeGraphBuilderPlugin.includeVariantContexts=false
CreateIntervalBedFilesPlugin.windowSize=1000
CreateIntervalBedFilesPlugin.bedFile=intervals.bed
LoadHaplotypesFromGVCFPlugin.queueSize=30
LoadHaplotypesFromGVCFPlugin.mergeRefBlocks=false
LoadHaplotypesFromGVCFPlugin.numThreads=3
LoadHaplotypesFromGVCFPlugin.maxNumHapsStaged=10000
RunHapConsensusPipelinePlugin.minTaxa=1
RunHapConsensusPipelinePlugin.distanceCalculation=Euclidean
RunHapConsensusPipelinePlugin.minFreq=0.5
RunHapConsensusPipelinePlugin.minCoverage=0.1
RunHapConsensusPipelinePlugin.mxDiv=0.01
RunHapConsensusPipelinePlugin.clusteringMode=upgma
RunHapConsensusPipelinePlugin.maxClusters=30
RunHapConsensusPipelinePlugin.minSites=30
RunHapConsensusPipelinePlugin.maxThreads=1000
RunHapConsensusPipelinePlugin.kmerSize=7
AssemblyHaplotypesMultiThreadPlugin.mummer4Path=/mummer/bin/
AssemblyHaplotypesMultiThreadPlugin.loadDB=true
AssemblyHaplotypesMultiThreadPlugin.minInversionLen=7500
AssemblyHaplotypesMultiThreadPlugin.assemblyMethod=mummer4
AssemblyHaplotypesMultiThreadPlugin.entryPoint=all
AssemblyHaplotypesMultiThreadPlugin.numThreads=3
AssemblyHaplotypesMultiThreadPlugin.clusterSize=250
numThreads=10
Xmx=10G
picardPath=/picard.jar
gatkPath=/gatk/gatk
tasselLocation=/tassel-5-standalone/run_pipeline.pl
fastqFileDir=/tempFileDir/data/fastq/
tempFileDir=/tempFileDir/data/bam/temp/
dedupedBamDir=/tempFileDir/data/bam/DedupBAMs/
filteredBamDir=/tempFileDir/data/bam/filteredBAMs/
gvcfFileDir=/tempFileDir/data/gvcfs/
extendedWindowSize=1000
mapQ=48
#Sentieon Parameters. Uncomment and set to use sentieon:
#sentieon_license=**UNASSIGNED**
#sentieonPath=/sentieon/bin/sentieon
########################################
#Optional Parameters With No Default Values:
########################################
HaplotypeGraphBuilderPlugin.chromosomes=null
HaplotypeGraphBuilderPlugin.haplotypeIds=null
CreateIntervalBedFilesPlugin.extendedBedFile=null
LoadHaplotypesFromGVCFPlugin.haplotypeMethodDescription=null
RunHapConsensusPipelinePlugin.referenceFasta=null
RunHapConsensusPipelinePlugin.rankingFile=null
RunHapConsensusPipelinePlugin.collapseMethodDetails=null
AssemblyHaplotypesMultiThreadPlugin.gvcfOutputDir=null
#FilterGVCF Parameters. Adding any of these will add more filters.#exclusionString=**UNASSIGNED**
#DP_poisson_min=0.0
#DP_poisson_max=1.0
#DP_min=**UNASSIGNED**
#DP_max=**UNASSIGNED**
#GQ_min=**UNASSIGNED**
#GQ_max=**UNASSIGNED**
#QUAL_min=**UNASSIGNED**
#QUAL_max=**UNASSIGNED**
#filterHets=**UNASSIGNED**
This is where I cannot figure out, what to fill in into the UNASSIGNED sections. I assume the file looks different because of the Singularity setup I ran ? Can you give me an example for this file ?
It would be also very helpful if you could add a singularity version of most commands shown in the Wiki, as they do seem very different from the Docker commands...
Cheers Jakob
Hi Zack,
Thanks a lot.
I started to fill in the gaps, but there are still plenty of spaces / UNASSIGED sections to fill.
I figured that for the initial run I mainly need to setup the required parameters: (this is how my config.txt looks now):
Again it is still very much not even close to the same file you hav, but some lines do match. This is probably because singularity .
I am mainly struggleing to find these plugin configs.
I assume the plugins are sitting somewhere int he singularity containers ? Or are these plugins somewhere within the working directory ?
Cheers Jakob
Hello,
I mentioned to replace all of the UNASSIGNED ones with what I had posted. I removed a lot of the specific PluginParameters where they had shared parameters with other plugins.
TASSEL(which the PHG is based off of) allows you to just specify the shared parameter name. An example of this is referenceFasta being shared by a number of plugins. You could specify Plugin1.referenceFasta, Plugin2.referenceFasta and so on, but for your use they are all the same. Some of the parameters have good defaults as well so they could be removed.
The config file has nothing to do with singularity.
The plugins are sitting in the Docker container(and by extension the singularity container) as part of TASSELs lib folder(/tassel-5-standalone/lib/phg.jar). When you run
/tassel-5-standalone/run_pipeline.pl -MakeDefaultDirectoryPlugin...
you are actually running a plugin which is in the container.