Question: Unit-testing for NGS data analysis pipeline
0
gravatar for Nandini
2.5 years ago by
Nandini840
Nandini840 wrote:

Hello,

I was wondering if anyone does unit-testing for NGS pipeline that follows GATK best practice and is written in bash ?

From what I understand- unit testing is done by developers who writes their own code that include functions.

Can it be done for a fairly basic piece of code that stitches various software together (bwa-picard-GATK) ? An example or links would be useful.

Many thanks

unit-testing pipeline ngs • 968 views
ADD COMMENTlink modified 2.5 years ago by apeltzer140 • written 2.5 years ago by Nandini840
2
gravatar for apeltzer
2.5 years ago by
apeltzer140
Tuebingen, Germany
apeltzer140 wrote:

Hi!

in general, there is not just one GATK Best Practice. There is about 3-4 at least and different ones for GATK 3.X and now for GATK 4.X.

You are correct, that Unit-Testing typically refers to test single Units of code (e.g. a function, method of a particular tool). Imagine writing a simple tool that calculates the mean of a given list of numbers, then you'd write some unit test to verify that your function does things properly.

I wouldn't call testing several tools that are already developed by others and are stitched together by you "unit testing" but more like "verification testing". There is people doing things like this, by providing simple / small test datasets and then testing the outcome of a developed pipeline automatically using continous integration such as Travis CI/Jenkins/Circle CI:

An example would be here:

https://github.com/SciLifeLab/NGI-RNAseq/tree/master/tests

They automatically test that new code pieces added to the pipeline do not crash the overall pipeline using such testcases and test-datasets.

Hope that helps!

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by apeltzer140

Thank you. So its for GATK 3.7. So if I understand your statement correctly, the verification would just be testing if the paths to the input/output folder is correct, the software generates the expected output for the next step etc. This would be mainly (not limited to) include if-else, echo statements in bash? Again, Travis CI/Jenkins/Circle CI - these are for developers, I assume or can they be used for something as basic as alignment (bwa) to mark dup (picard) to variant calling (gatk 3.7)?

ADD REPLYlink written 2.5 years ago by Nandini840

No, the Pipeline I put up as an example up there with a link to Github checks more than just paths: The tests that run there with TravisCI run a typical pipeline with some example data and check whether this runs/works well.

You can basically run any kind of testing script with these services - but would need docker or something to ship your tools in an efficient way to the testservers.

ADD REPLYlink written 2.5 years ago by apeltzer140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1562 users visited in the last hour