So I am preparing to start my new research project: Bioinformatic analysis of high-throughput DNA sequencing data
Involving the development of some software:
the student will develop either a user-friendly workflow (with snakemake or nextflow) or an interactive web user interface. The student will also improve documentation and suggest extended applications of currently available software
I currently don't know a lot about both of the workflow engines and I have to make a decision of which pathway to take, either snakemake or nextflow. Just looking to seek some advice of which method would be best for a beginner starting from scratch as current discussions that are available that weigh both options seem quite confusing.
Thank you and Kind regards
If you know python well then
snakemakemay be the way to go. If you are familiar with
groovy(programming for Java VM) then
nextflowmay be easier. In any case you will need to know programming basics.
Please don't try to develop an interactive web user interface from scratch.
galaxyalready exists for that application.
Thanks for your response! My python programming is fluent, but I am always upto picking up a new language, but regarding the groovy syntax is somewhat similar to that of R and Python or completely different?
Unfortunately I think my supervisor is wanting me to create it from scratch, but I will look into galaxy and propose it in my next meeting thanks for that, if you have any recommended videos, pages on galaxy that would be great.
Look under the
helpdropdown to find all you need.
Snakemake vs. Nextflow: strengths and weaknesses
Thanks for the link!
Having used both extensively, I'd say nextflow is slightly more difficult to get into than snakemake, but makes up for it in a big way in its functionality.
Brilliant, thanks for the advice, i think after looking into what the community has said I have invested into learning the groovy language so when ready to build my application, I can fully utilize the nextflow functionality plus adding another language under the belt never hurts.
Don't worry about groovy too much. I have been writing productive nextflow pipelines for a couple of years and still haven't learned much if any groovy.
You can get by with the usual tools plus the unix toolbox of bash sed head etc for the most part.
One important part is to understand Nextflow channels and processes and follow some examples before you really start. I've seen plenty of beginners writing for loops for dealing with input from a path variable within nextflow processes, rather than using a channel for input so NF can actually do its job properly.
Snakemake seems to have a lot more boilerplate code in the examples I've seen, and require more effort to submit to different cluster types. The way people write .done files if a rule completes successfully is .... not optimal. Nextflows work dir and resume functionality is a lot more elegant.