No localGVCFFolder parameter in config file - problem passing parameter to pipeline
1
0
Entering edit mode
5 weeks ago

Hi there, a bit mystified with how to pass this parameter. It's there in my config file, and seems to be read, but then eventually i get the warning that localGVCFFolder doesn't have a parameter in the config file

WARN net.maizegenetics.pangenome.pipeline.MakeInitialPHGDBPipelinePlugin - No localGVCFFolder parameter in config file - will not copy created reference gvcfs to folder for consensus processing.

STEP0

sudo singularity build phg_20230209.simg docker://maizegenetics/phg

WORKING_DIR="/phg/rc_small_db"

singularity exec -B /home/mshenton/analysis/PHG/:/phg/ /home/mshenton/analysis/PHG/phg_20230209.simg /tassel-5-standalone/run_pipeline.pl -debug -Xmx1G -MakeDefaultDirectoryPlugin -workingDir ${WORKING_DIR} -endPlugin STEP1 WORKING_DIR="/home/mshenton/analysis/PHG/rc_small_db" SINGULARITY_CONFIG_FILE=/phg/DBconfig.txt singularity exec -B$WORKING_DIR:/phg/ /home/mshenton/analysis/PHG/phg_20230209.simg /tassel-5-standalone/run_pipeline.pl \ -Xmx20G -debug -configParameters ${SINGULARITY_CONFIG_FILE} \ -MakeInitialPHGDBPipelinePlugin -endPlugin DBconfig.txt: # host option host=localHost user=sqlite password=sqlite DB=/phg/rc_small_db.db DBtype=sqlite outputDir=/phg/outputDir liquibaseOutdir=/phg/outputDir refServerPath=localhost:/ referenceFasta=/phg/inputDir/reference/IRGSP-1.0_genome_M_C_unanchored.fa genomeData=/phg/inputDir/reference/load_genome_data.txt anchors=/phg/inputDir/reference/valid1000RAP-DB_MSU_intervals.bed localGVCFFolder=/phg/GVCFFolder Blockquote[main] INFO net.maizegenetics.plugindef.ParameterCache - load: loading parameter cache with: /phg/DBconfig.txt [main] INFO net.maizegenetics.plugindef.ParameterCache - ParameterCache: key: anchors value: /phg/inputDir/reference/valid1000RAP-DB_MSU_intervals.bed [main] INFO net.maizegenetics.plugindef.ParameterCache - ParameterCache: key: configFile value: /phg/DBconfig.txt [main] INFO net.maizegenetics.plugindef.ParameterCache - ParameterCache: key: outputDir value: /phg/outputDir [main] INFO net.maizegenetics.plugindef.ParameterCache - ParameterCache: key: referenceFasta value: /phg/inputDir/reference/IRGSP-1.0_genome_M_C_unanchored.fa [main] INFO net.maizegenetics.plugindef.ParameterCache - ParameterCache: key: user value: sqlite [main] INFO net.maizegenetics.plugindef.ParameterCache - ParameterCache: key: DB value: /phg/rc_small_db.db [main] INFO net.maizegenetics.plugindef.ParameterCache - ParameterCache: key: DBtype value: sqlite [main] INFO net.maizegenetics.plugindef.ParameterCache - ParameterCache: key: localGVCFFolder value: /phg/GVCFFolder [main] INFO net.maizegenetics.plugindef.ParameterCache - ParameterCache: key: liquibaseOutdir value: /phg/outputDir [main] INFO net.maizegenetics.plugindef.ParameterCache - ParameterCache: key: password value: sqlite [main] INFO net.maizegenetics.plugindef.ParameterCache - ParameterCache: key: host value: localHost [main] INFO net.maizegenetics.plugindef.ParameterCache - ParameterCache: key: refServerPath value: localhost:/ [main] INFO net.maizegenetics.plugindef.ParameterCache - ParameterCache: key: genomeData value: /phg/inputDir/reference/load_genome_data.txt [.......] Blockquote[pool-1-thread-1] INFO net.maizegenetics.pangenome.liquibase.LiquibaseUpdatePlugin - Please wait, begin Command:liquibase --driver=org.sqlite.JDBC --url=jdbc:sqlite:/phg/rc_small_db.db --username=sqlite --password=sqlite --changeLogFile=changelogs/db.changelog-master.xml --loglevel=FINE changeLogSync [pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.liquibase.LiquibaseUpdatePlugin: time: Feb 9, 2023 5:10:12 [pool-1-thread-1] INFO net.maizegenetics.pangenome.pipeline.MakeInitialPHGDBPipelinePlugin - Done setting up Liquibase. [pool-1-thread-1] WARN net.maizegenetics.pangenome.pipeline.MakeInitialPHGDBPipelinePlugin - No localGVCFFolder parameter in config file - will not copy created reference gvcfs to folder for consensus processing. [pool-1-thread-1] INFO net.maizegenetics.pangenome.pipeline.MakeInitialPHGDBPipelinePlugin - MakeInitialPHGDBPipelinePlugin complete! [pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.pipeline.MakeInitialPHGDBPipelinePlugin: time: Feb 9, 2023 5:10:12 [pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.pipeline.MakeInitialPHGDBPipelinePlugin: time: Feb 9, 2023 5:10:12: progress: 100% ADD COMMENT 0 Entering edit mode Hi there, a bit mystified with how to pass this parameter. It's there in my config file, and seems to be read, but then eventually i get the warning that localGVCFFolder doesn't have a parameter in the config file what is the context of that question ?! "pass this parameter" to what ? ADD REPLY 0 Entering edit mode Dear Pierre, Thanks for responding. According to the warning, I thought that the MakeInitialPHGDBPipelinePlugin required the "localGVCFFolder parameter" and I was wondering how to set that correctly. Sorry for my poor explanation. I will make further checks and respond to the thread again next week. Best regards Matt ADD REPLY 1 Entering edit mode 5 weeks ago lcj34 ▴ 220 HI Matt - Can you tell me which version of the PHG you are running? In the latest versions, that warning no longer exists. We decided against programmatically moving files. This was related to doing file transfers of stored GVCF files from a host system to a local system. File transfers were an issue for some servers. Currently, when the reference intervals are processed, the LoadAllIntervalstToPHGdbPlugin will create a gvcf file from the reference haplotypes. The ref gvcf and indexed gvcf are stored in the same folder as the ref fasta. You should copy those gvcf files to your defined localGVCFFolder to be picked up for use when creating graphs that need to include variant data. I"m sorry this isn't clear. I will check the documentation and update as needed. Lynn ADD COMMENT 0 Entering edit mode Dear Lynn, Many thanks for your reply. I used singularity build phg_20230208.simg docker://maizegenetics/phg on the 8th Feb 2023 I was under the impression that this would get me the latest version. My apologies, I can't make further checks until next week. I will update the thread again then. Best regards Matt ADD REPLY 1 Entering edit mode Hi Matt - Thanks for the info. We recommend when pulling PHG from docker you request a specific tag. This way you will always know which version was run as "latest" is updated each time a new image is posted to the hub. Having said that, when the next version is posted, it will have a PHG version included in the logs when PHG is run . Lynn ADD REPLY 0 Entering edit mode Dear Lynn, Thanks again for looking at this. I should explain what I want to do. I am starting with a single rice reference genome, and adding haplotypes from gvcf files I generated by short read mapping. So, I am going through the pipelines again, this time I specified sudo singularity build phg1.3.simg docker://maizegenetics/phg:1.3 I think 1.3 is the latest version? I still get the warning "[pool-1-thread-1] WARN net.maizegenetics.pangenome.pipeline.MakeInitialPHGDBPipelinePlugin - No localGVCFFolder parameter in config file - will not copy created reference gvcfs to folder for consensus processing." although I have created a folder called "GVCFFolder" and included it's path in the config file "localGVCFFolder=/phg/GVCFFolder" However, I have copied the ref gvcf and indexed gvcf to this folder, as you mentioned, and proceeded to the adding haplotypes step using CreateHaplotypesFromGVCF.groovy This script seems to proceed OK, and I appear to have some halotypes in my DB (sqlite database) A couple of questions here: 1) I have the gvcf files at inputDir/loadDB/gvcf and in my "local gvcf folder". Can i set localGVCFFolder as inputDir/loadDB/gvcf? As it stands, I will eventually have the files in three separate locations, including the gvcfServerPath, or have I misunderstood? 2) In the "load_genome_data.txt" file there is a "Method" column. Does this method affect the create consensus steps later? 3) I am trying to make consensus haplotypes using CreateConsensi.sh, but clearly I haven't got something right. I have "0 taxa used to build distance matrix in createDistanceMatrix". I guess I have failed to specify something (a method?) sudo singularity exec -B${WORKDIR}:/phg/ /home/mshenton/analysis/PHGsingularity/phg1.3.simg /CreateConsensi.sh \${SINGULARITY_CONFIG_FILE} IRGSP-1.0_genome_M_C_unanchored.fa GATK_PIPELINE CONSENSUS001_20

my config:

numThreads=2 Xmx=24G liquibaseOutdir=/phg/outputDir referenceFasta=/phg/inputDir/reference/IRGSP-1.0_genome_M_C_unanchored.fa anchors=/phg/inputDir/reference/valid1000RAP-DB_MSU_intervals.bed haplotypeMethod=genic consensusMethod=CONSENSUS001_20

outputDir=/phg/outputDir/align/ gvcfOutputDir=/phg/outputDir/align/gvcfs/

refRangeMethods=genic,intergenic extendedWindowSize=1000

includeVariants=true minSite=3 minCoverage=0.1 maxThreads=2 minTaxa=1 mxDiv=0.001

localGVCFFolder=/phg/GVCFFolder rankingFile=/phg/rankingFile.txt

1
Entering edit mode

to answer your questions: (1) yes, you can set localGVCF to inputDir/loadDB/gvcf. Any place is fine as long as the software can see them. Frequently people come back to a db after it has sat for awhile and they only want to run consensus, or impute, or create a VCF from paths. In those cases. In those cases, the software needs to know where a local copy of the gvcfs live. The assumption is you have stored the gvcfs on a server some where that multiple people can access to bring to their local machines. THe software is merely asking "where can I find a copy of these files on your local machine". if they are still in inputDir/loadDB/gvcf, then that is fine to give as the localGVCF dir

(2) The Method column in the load_genome_data.txt: This column is used to associate the haplotypes for the listed genome with a method. It doesn't effect the Consensus haplotypes. When you run consensus, you specify a consensus method at that time . We often have long names for ours with indicate a name for the haplotypes that were used to create the consensus, and often the parameters used when running the consensus pipeline, e.g. CONSENSUS_84plusRef_mxDiv_10toNeg4_maxClusers30

(3) The message "0 taxa used to build distance matrix in createDistanceMatrix" implies there were no variants for the taxa it tried to load. This could be an issue with not finding your gvcf files. But it also could mean there were no nodes in your graph. It might be helpful to see the full log file. if it is too big to post, you can send it to me privately at lcj34@cornell.edu

0
Entering edit mode

"0 taxa used to build distance matrix in createDistanceMatrix" seems to have been caused because the chromosome names in the gvcf reference were different from those in the gvcf files.

The "chr" prefix was removed by -CreateValidIntervalsFilePlugin , but remained in my original gvcf files.

I now seem to have things working as far as creating consensus.

Matt

0
Entering edit mode

PS Does the CreateSmallGenomesPlugin still work for the latest version of PHG?

1
Entering edit mode

I think CreateSmallGenomesPlugin still works, but try it and let me know if you have problems.