Question: falcon-unzip installation and testing using pacbio data
0
gravatar for rob234king
21 months ago by
rob234king580
UK/Harpenden/Rothamsted Research
rob234king580 wrote:

I have installed falcon-unzip as below using the pre-compiled binaries and seems fine

cd /home/data/bioinf_resources/programming_tools/
#use virtualenv-2.7
unset PYTHONPATH
source falcontest/bin/activate
tar xvzf falcon-2017.11.02-16.04-py2.7-ucs4.tar.gz -C falcontest
export LD_LIBRARY_PATH=falcontest/lib:${LD_LIBRARY_PATH}
 export PYTHONPATH=/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7
pip install pandas
easy_install --upgrade numpy

#setpath for mummer nucmer and show-cords
export PATH=/home/data/bioinf_resources/programming_tools/mummer-3.9.4alpha:$PATH

How do I run the example data, what commands and config files to use? I can download the raw data from ENA using project codes below. Arabidopsis data: PRJNA314706 V. vinifera cv. Cabernet Sauvignon: PRJNA316730 I've downloaded the config files from below for a test run: https://github.com/PacificBiosciences/FALCON_unzip/tree/master/examples And changed the line in fc_unzip.cfg to below: smrt_bin=/home/data/bioinf_resources/programming_tools/falcontest/bin/

I've downloaded the example assemblies file which comes with the config files used: fc_run.cfg input.fofn fc_unzip.cfg unzip.sh Assume download the raw data and put paths in “input.fofn” but then how to start it..

I have downloaded the raw data for the arabidopsis assembly as a test first. I have updated the input.fofn file with locations and smrtanalysis/bin location in fc_unzip.cfg

The unzip.sh file first command has changed in the pre-built binaries. This is the file used in the paper to run start to finish but very first command I get an error.

"This fc_track_reads.py" has become "fc_track_reads_htigs0.py"

Change made but when run first command get the below error

fc_track_reads_htigs0.py
No handlers could be found for logger "pypeflow.simple_pwatcher_bridge"
Traceback (most recent call last):
  File "/home/data/bioinf_resources/programming_tools/falcontest/bin/fc_track_reads_htigs0.py", line 11, in <module>
    load_entry_point('falcon-unzip==0.4.0', 'console_scripts', 'fc_track_reads_htigs0.py')()
  File "/scratch/cdunn/fork/.git/LOCAL4/lib/python2.7/site-packages/falcon_unzip/mains/track_reads_htigs0.py", line 338, in main
  File "/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow/simple_pwatcher_bridge.py", line 273, in refreshTargets
    self._refreshTargets(updateFreq, exitOnFailure)
  File "/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow/simple_pwatcher_bridge.py", line 339, in _refreshTargets
    raise Exception(msg)
Exception: Some tasks are recently_done but not satisfied: set([Node(0-rawreads), Node(1-preads_ovl)])

UPDATE:

I did not have pypeflow which I now do (although didn't see anywhere that said I needed it). And I am just trying a small dataset, e.coli. And just running falcon with the below fc_run.py fc_run.cfg

but get the error in local mode:

> 2018-02-26 12:45:34,727 - fc_run - INFO - Setup logging from file
> "None". 2018-02-26 12:45:34,822 - fc_run - INFO - fc_run started with
> configuration fc_run.cfg 2018-02-26 12:45:34,832 - fc_run - INFO -  No
> target specified, assuming "assembly" as target  2018-02-26
> 12:45:34,833 - pypeflow.simple_pwatcher_bridge - WARNING - In
> simple_pwatcher_bridge, pwatcher_impl=<module 'pwatcher.fs_based' from
> '/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow-1.0.0-py2.7.egg/pwatcher/fs_based.pyc'>
> 2018-02-26 12:45:34,834 - pypeflow.simple_pwatcher_bridge - INFO - In
> simple_pwatcher_bridge, pwatcher_impl=<module 'pwatcher.fs_based' from
> '/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow-1.0.0-py2.7.egg/pwatcher/fs_based.pyc'>
> 2018-02-26 12:45:34,855 - pypeflow.simple_pwatcher_bridge - INFO -
> job_type='local', job_queue='', sge_option='-pe smp 8 -q your_queue',
> use_tmpdir=False, squash=False, job_name_style=0 2018-02-26
> 12:45:34,867 - pypeflow.simple_pwatcher_bridge - DEBUG - Created
> PypeTask('0-rawreads/raw-fofn-abs',
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads/raw-fofn-abs',
> "{'o_fofn': PLF('0-rawreads/raw-fofn-abs/input.fofn', None)}",
> "{'i_fofn': PLF('input.fofn', None)}") 2018-02-26 12:45:34,868 -
> pypeflow.simple_pwatcher_bridge - DEBUG - Added
> PRODUCERS['0-rawreads/raw-fofn-abs'] =
> PypeTask('0-rawreads/raw-fofn-abs',
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads/raw-fofn-abs',
> "{'o_fofn': PLF('0-rawreads/raw-fofn-abs/input.fofn', None)}",
> "{'i_fofn': PLF('input.fofn', None)}") 2018-02-26 12:45:34,869 -
> pypeflow.simple_pwatcher_bridge - DEBUG - Built
> PypeTask('0-rawreads/raw-fofn-abs',
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads/raw-fofn-abs',
> "{'o_fofn': PLF('input.fofn', '0-rawreads/raw-fofn-abs')}",
> "{'i_fofn': PLF('input.fofn', None)}") 2018-02-26 12:45:34,869 -
> pypeflow.simple_pwatcher_bridge - DEBUG - New
> Node(0-rawreads/raw-fofn-abs) needs set([]) 2018-02-26 12:45:34,891 -
> pypeflow.simple_pwatcher_bridge - INFO - Num unsatisfied: 0, graph: 1
> 2018-02-26 12:45:34,893 - pypeflow.simple_pwatcher_bridge - DEBUG -
> Created PypeTask('0-rawreads',
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads',
> "{'length_cutoff': PLF('0-rawreads/length_cutoff', None),\n
> 'raw_reads_db': PLF('0-rawreads/raw_reads.db', None),\n
> 'rdb_build_done': PLF('0-rawreads/rdb_build_done', None),\n
> 'run_jobs': PLF('0-rawreads/run_jobs.sh', None)}", "{'input_fofn':
> PLF('input.fofn', '0-rawreads/raw-fofn-abs')}") 2018-02-26
> 12:45:34,895 - pypeflow.simple_pwatcher_bridge - DEBUG - Added
> PRODUCERS['0-rawreads'] = PypeTask('0-rawreads',
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads',
> "{'length_cutoff': PLF('0-rawreads/length_cutoff', None),\n
> 'raw_reads_db': PLF('0-rawreads/raw_reads.db', None),\n
> 'rdb_build_done': PLF('0-rawreads/rdb_build_done', None),\n
> 'run_jobs': PLF('0-rawreads/run_jobs.sh', None)}", "{'input_fofn':
> PLF('input.fofn', '0-rawreads/raw-fofn-abs')}") 2018-02-26
> 12:45:34,898 - pypeflow.simple_pwatcher_bridge - DEBUG - Built
> PypeTask('0-rawreads',
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads',
> "{'length_cutoff': PLF('length_cutoff', '0-rawreads'),\n
> 'raw_reads_db': PLF('raw_reads.db', '0-rawreads'),\n 'rdb_build_done':
> PLF('rdb_build_done', '0-rawreads'),\n 'run_jobs': PLF('run_jobs.sh',
> '0-rawreads')}", "{'input_fofn': PLF('input.fofn',
> '0-rawreads/raw-fofn-abs')}") 2018-02-26 12:45:34,898 -
> pypeflow.simple_pwatcher_bridge - DEBUG - New Node(0-rawreads) needs
> set([Node(0-rawreads/raw-fofn-abs)]) 2018-02-26 12:45:34,901 -
> pypeflow.simple_pwatcher_bridge - INFO - Num unsatisfied: 1, graph: 2
> 2018-02-26 12:45:34,901 - pypeflow.simple_pwatcher_bridge - INFO -
> About to submit: Node(0-rawreads) 2018-02-26 12:45:34,901 -
> pypeflow.simple_pwatcher_bridge - DEBUG - enque nodes:
> set([Node(0-rawreads)]) 2018-02-26 12:45:34,967 -
> pypeflow.simple_pwatcher_bridge - DEBUG - In
> rundir='/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads',
> sge_option='-pe smp 8 -q your_queue', __sge_option='-pe smp 8 -q
> your_queue' 2018-02-26 12:45:34,967 - pwatcher.fs_based - DEBUG -
> run(jobids=<1>, job_type=local, job_queue=) 2018-02-26 12:45:34,968 -
> pwatcher.fs_based - DEBUG - jobs: {'P76645cb57cfd20':
> Job(jobid='P76645cb57cfd20', cmd='/bin/bash run.sh',
> rundir='/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads',
> options={'job_queue': '', 'sge_option': '-pe smp 8 -q your_queue',
> 'job_type': 'local'})} 2018-02-26 12:45:34,968 - pwatcher.fs_based -
> INFO - starting job Job(jobid='P76645cb57cfd20', cmd='/bin/bash
> run.sh',
> rundir='/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads',
> options={'job_queue': '', 'sge_option': '-pe smp 8 -q your_queue',
> 'job_type': 'local'}) 2018-02-26 12:45:34,969 - pwatcher.fs_based -
> DEBUG - Wrapped "python2.7 -m pwatcher.mains.fs_heartbeat
> --directory=/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads
> --heartbeat-file=/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/heartbeats/heartbeat-P76645cb57cfd20
> --exit-file=/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/exits/exit-P76645cb57cfd20
> --rate=10.0 /bin/bash run.sh || echo 99 >| /home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/exits/exit-P76645cb57cfd20"
> 2018-02-26 12:45:34,969 - pwatcher.fs_based - DEBUG - Writing wrapper
> "/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/wrappers/run-P76645cb57cfd20.bash"
> 2018-02-26 12:45:35,002 - pwatcher.fs_based - DEBUG - CD:
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/jobs/P76645cb57cfd20'
> <- '/home/data/bioinf_resources/programming_tools/falcontest/raw'
> 2018-02-26 12:45:35,012 - pwatcher.fs_based - DEBUG - dir:
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/jobs/P76645cb57cfd20'
> call: '/bin/bash
> /home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/wrappers/run-P76645cb57cfd20.bash
> 1>|stdout 2>|stderr & ' 2018-02-26 12:45:35,019 - pwatcher.fs_based -
> DEBUG - pid=40352 pgid=40352 sub-pid=40573 2018-02-26 12:45:35,020 -
> pwatcher.fs_based - DEBUG - CD:
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/jobs/P76645cb57cfd20'
> -> '/home/data/bioinf_resources/programming_tools/falcontest/raw' 2018-02-26 12:45:35,022 - pwatcher.fs_based - INFO - Submitted
> backgroundjob=MetaJobLocal(MetaJob(job=Job(jobid='P76645cb57cfd20',
> cmd='/bin/bash run.sh',
> rundir='/home/data/bioinf_resources/programming_tools/falcontest/raw/0-rawreads',
> options={'job_queue': '', 'sge_option': '-pe smp 8 -q your_queue',
> 'job_type': 'local'}), lang_exe='/bin/bash')) 2018-02-26 12:45:35,023
> - pypeflow.simple_pwatcher_bridge - DEBUG - Result of watcher.run()={'submitted': ['P76645cb57cfd20']} 2018-02-26
> 12:45:35,023 - pypeflow.simple_pwatcher_bridge - DEBUG - N in queue: 1
> (max_jobs=8) 2018-02-26 12:45:35,024 - pwatcher.fs_based - DEBUG -
> query(which='list', jobids=<1>) 2018-02-26 12:45:35,041 -
> pwatcher.fs_based - DEBUG - Unable to remove heartbeat
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/heartbeats/heartbeat-P76645cb57cfd20' when sentinal was found in exit-sentinels listdir. Traceback (most
> recent call last):   File
> "/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow-1.0.0-py2.7.egg/pwatcher/fs_based.py",
> line 565, in get_status
>     os.remove(heartbeat_path) OSError: [Errno 2] No such file or directory:
> '/home/data/bioinf_resources/programming_tools/falcontest/raw/mypwatcher/heartbeats/heartbeat-P76645cb57cfd20'
> 
> 2018-02-26 12:45:35,045 - pwatcher.fs_based - DEBUG - Status EXIT 256
> for heartbeat:heartbeat-P76645cb57cfd20 2018-02-26 12:45:35,045 -
> pypeflow.simple_pwatcher_bridge - ERROR - Task Node(0-rawreads) failed
> with exit-code=256 2018-02-26 12:45:35,046 -
> pypeflow.simple_pwatcher_bridge - DEBUG - recently_done:
> [(Node(0-rawreads), False)] 2018-02-26 12:45:35,046 -
> pypeflow.simple_pwatcher_bridge - DEBUG - Num done in this iteration:
> 1 2018-02-26 12:45:35,047 - pypeflow.simple_pwatcher_bridge - INFO -
> recently_satisfied: set([]) 2018-02-26 12:45:35,047 -
> pypeflow.simple_pwatcher_bridge - INFO - Num satisfied in this
> iteration: 0 2018-02-26 12:45:35,047 - pypeflow.simple_pwatcher_bridge
> - INFO - Num still unsatisfied: 1 2018-02-26 12:45:35,048 - pypeflow.simple_pwatcher_bridge - ERROR - Some tasks are recently_done
> but not satisfied: set([Node(0-rawreads)]) 2018-02-26 12:45:35,048 -
> pypeflow.simple_pwatcher_bridge - ERROR - ready: set([])  submitted:
> set([]) 2018-02-26 12:45:35,049 - pwatcher.fs_based - DEBUG -
> delete(which='known', jobids=<0>) 2018-02-26 12:45:35,049 -
> pwatcher.fs_based - DEBUG - Deleting jobs for jobids from known ([])
> 2018-02-26 12:45:35,052 - pwatcher.fs_based - DEBUG - Failed to kill
> job for heartbeat 'heartbeat-P76645cb57cfd20': IOError(2, 'No such
> file or directory') 2018-02-26 12:45:35,083 - pwatcher.fs_based -
> DEBUG - Cannot remove heartbeat: OSError(2, 'No such file or
> directory') 2018-02-26 12:45:35,084 - pypeflow.simple_pwatcher_bridge
> - DEBUG - In notifyTerminate(), result of delete:None
pacbio falcon • 1.4k views
ADD COMMENTlink modified 21 months ago • written 21 months ago by rob234king580

Here is a tutorial for Falcon if you have not seen it.

ADD REPLYlink written 21 months ago by genomax74k

I will try this and see if can get it to complete. Thanks

ADD REPLYlink written 20 months ago by rob234king580

I got the test data to run using the config file provided in tutorial. It looks like was successful. The raw data was fasta. I just tested with fastq data and ends with error. From ENA they quite often provide the data in fastq so using this tools looks like I can convert https://github.com/zyndagj/FALCON-formatter to the format that it would require if had fastq data. Have you any experience of using fastq data insteaed or the h5 raw files?

ADD REPLYlink modified 20 months ago • written 20 months ago by rob234king580
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2068 users visited in the last hour