Small demo bioinformatics workflows which don't require reference data?
Entering edit mode
4.1 years ago
steve ★ 3.3k

I am looking for small, short and lightweight bioinformatics pipelines to use for demonstration purposes which can be easily run without downloading large amounts of reference data sets, for use in program development. Ideally, something that I can bundle in with a large program as a basic placeholder to show that "generic bioinformatic pipeline is working". This would be something that meets criteria such as:

  • required input data is small, less than a Megabyte (MB) in size
  • required software can be easily bundled with Docker or conda/pip (preferably the latter)
  • total execution time does not exceed more than a minute or so on a lightweight machine (e.g. a cheap laptop)
  • the pipeline output data would ideally be in a flat text format of some sort, so that it can be easily parsed by unit testing tools to verify results

For a time I had thought I found a good candidate in .vcf file annotation with VEP, since I can easily bundle some tiny demo .vcf files in a git repo along with a VEP install script and Docker container, but unfortunately I found the MySQL ports required for VEP to query its online reference databases are blocked by my employer.

Any other suggestions for this?

workflow pipeline • 745 views
Entering edit mode

We have a small demonstration for polygenic risk score analysis. We simulated data using the 1000 genome and the resulting file size, after compression is around 100M. Maybe you can use that?

Website is here


Login before adding your answer.

Traffic: 1876 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6