Question

Running PHG with singularity: Error at CreatePHG_step2_addHapsFromGVCF

0

Entering edit mode

3.1 years ago

bp • 0

HI,

I am trying to setup PHG using singularity. So far into the steps 1 was good until I hit an iceberg at step: 2D Filter GVCF and add variants to database (https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/UserInstructions/CreatePHG_step2_addHapsFromGVCF.md)

I am working with 95 taxa and the parameters required for the step 2D as stated in the wiki is provided in the config file in the following manner:

LoadHaplotypesFromGVCFPlugin Parameters
wgsKeyFile: /tempFileDir/data/load_wgs_genome_key_file.txt
gvcfDir: /tempFileDir/data/outputs/gvcfs/
referenceFasta: /tempFileDir/data/reference/reference.fa
bedFile: /tempFileDir/data/bam/temp/intervals.bed
haplotypeMethodName: GATK_PIPELINE
haplotypeMethodDescription: GVCF_DESCRIPTION
numThreads: 3
maxNumHapsStaged: 10000
mergeRefBlocks: false
queueSize: 30

the rest of the parameters unchanged from previous step

To run the script using singularity; I ran the following command

singularity exec -B /tempFileDir/:/tempFileDir/ ~/phg_latest.sif  /CreateHaplotypesFromGVCF.groovy -config /tempFileDir/data/config.txt

The script runs momentarily to give me the following error:

[pool-1-thread-1] DEBUG net.maizegenetics.plugindef.AbstractPlugin - Error writing to the DB:
java.lang.IllegalStateException: Error writing to the DB:
        at net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin.processData(LoadHaplotypesFromGVCFPlugin.kt:226)
        at net.maizegenetics.plugindef.AbstractPlugin.performFunction(AbstractPlugin.java:111)
        at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:2017)
        at net.maizegenetics.plugindef.ThreadedPluginListener.run(ThreadedPluginListener.java:29)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.NoSuchElementException: Collection is empty.
        at kotlin.collections.CollectionsKt___CollectionsKt.first(_Collections.kt:184)
        at net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin$processKeyFileEntry$2.invokeSuspend(LoadHaplotypesFromGVCFPlugin.kt:312)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:241)
        at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:594)
        at kotlinx.coroutines.scheduling.CoroutineScheduler.access$runSafely(CoroutineScheduler.kt:60)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:740)

I can't seem to figure out the problem here. Please help.

PHG singularity • 1.3k views

ADD COMMENT • link 3.1 years ago by bp • 0

0

Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY • link 3.1 years ago by GenoMax 141k

0

Entering edit mode

many thanks Genomax! I'm new to this and not fully acquainted with all the tricks/skills. I'll adhere to the posting guidelines from next time.

ADD REPLY • link 3.1 years ago by bp • 0

score 0 · Answer 1 · 2021-03-16

0

Entering edit mode

3.1 years ago

zrm22 ▴ 40

Hello,

The error which is being thrown is coming from your bedFile being empty. The CreateHaplotypesFromGVCF.groovy script will attempt to pull down a BED file based on what is in the DB using the following parameter which should have been set:

refRangeMethods=FocusRegion,FocusComplement

If you look in the log file this should be set and it should be correct. Are there any other errors higher up in the log file? Could you post the full file? I would also check to see if the automatically generated BED file has any entries in it. It should be here(unless you set the tempFileDir parameter in the config file):

/phg/inputDir/loadDB/bam/temp/intervals.bed

In the meantime, I will add in a more informative error message which should be included in the next release.

ADD COMMENT • link 3.1 years ago by zrm22 ▴ 40

0

Entering edit mode

Hi Zack,

Thanks for addressing my issue. Setting the parameter refRangeMethods to 'FocusRegion,FocusComplement' did solve the problem that I was having.

I guess I got confused with what I read on the wiki and mis-interpreted the statement;

refRangeMethods=refRegionGroup This is used to extract a BED file out of the DB before the GVCF file is processed. The BED file is then used to extract out regions of the GVCF used to become the haplotypes. Typically, refRegionGroup refers to the anchor Reference ranges. If "refRegionGroup,refInterRegionGroup" is used it will create a BED file representing both anchors and inter anchors. We strongly suggest not setting this parameter in the Config File

ADD REPLY • link 3.1 years ago by bp • 0

0

Entering edit mode

Update:

It did work after I set the parameter 'refRangeMethods=FocusRegion,FocusComplement'. However, it didn't run for long pertaining to the out of memory error.

 [pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin: time: Mar 17, 2021 19:25:40: progress: 100%                                                          [pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin  Citation: Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. (2007) TASSEL: Soft$[pool-1-thread-1] ERROR net.maizegenetics.plugindef.ThreadedPluginListener - Out of Memory: LoadHaplotypesFromGVCFPlugin could not complete task:                                                                                            Current Max Heap Size: 10225 Mb                                                                                                                                                                                                              Use -Xmx option in start_tassel.pl or start_tassel.bat                                                                                                                                                                                       to increase heap size. Included with tassel standalone zip.

Then I changed the -Xmx to a 100G and continued to run it which ended up giving me another error 'Error writing to the DB: caused by PHGDBAccess:putHaplotypesForGamete: failed' for chromosome 7D of my first taxa which I believe had been processed in my first attempt.

 [pool-1-thread-1] DEBUG net.maizegenetics.plugindef.AbstractPlugin - Error writing to the DB:
            java.lang.IllegalStateException: Error writing to the DB:
                    at net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin.processData(LoadHaplotypesFromGVCFPlugin.kt:226)
                    at net.maizegenetics.plugindef.AbstractPlugin.performFunction(AbstractPlugin.java:111)
                    at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:2017)
                    at net.maizegenetics.plugindef.ThreadedPluginListener.run(ThreadedPluginListener.java:29)
                    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
                    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                    at java.lang.Thread.run(Thread.java:748)
            Caused by: java.lang.IllegalStateException: PHGdbAccess:putHaplotypesForGamete: failed
                    at net.maizegenetics.pangenome.db_loading.PHGdbAccess.putHaplotypesForGamete(PHGdbAccess.java:1617)
                    at net.maizegenetics.pangenome.db_loading.PHGdbAccess.processHaplotypesData(PHGdbAccess.java:1722)
                    at net.maizegenetics.pangenome.db_loading.PHGdbAccess.putHaplotypesData(PHGdbAccess.java:1635)
                    at net.maizegenetics.pangenome.db_loading.LoadHaplotypesFromGVCFPlugin$processDBUploading$2.invokeSuspend(LoadHaplotypesFromGVCFPlugin.kt:619)
                    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
                    at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:241)
                    at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:594)                                                                                                                                                     at kotlinx.coroutines.scheduling.CoroutineScheduler.access$runSafely(CoroutineScheduler.kt:60)                                                                                                                                               at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:740)                                                                                                                                                    Suppressed: java.lang.OutOfMemoryError: GC overhead limit exceeded
                            at java.util.HashMap.newNode(HashMap.java:1750)

Would you be able to help me on this?

ADD REPLY • link 3.1 years ago by bp • 0

0

Entering edit mode

I guess I'm updating the status here as I resolved the issue. For future reference or anyone needing the solution to the above stated problem;

Apparently, if you get an error message with regards to maximum memory usage (or any other errors?) while running CreateHaplotypesFromGVCF.groovy plugin, do not continue just by resolving that particular issue. One more issue that followed in my case, was a corrupt database. I'm not sure if this happens every time, but one possible avenue could be a corrupt database which will lead to myriads of other "error writing to the db" errors. A simple solution was to re-create the db and run it through. It worked for me.

ADD REPLY • link 3.1 years ago by bp • 0