Unit-testing for NGS data analysis pipeline
Entering edit mode
3.4 years ago
Nandini ▴ 910


I was wondering if anyone does unit-testing for NGS pipeline that follows GATK best practice and is written in bash ?

From what I understand- unit testing is done by developers who writes their own code that include functions.

Can it be done for a fairly basic piece of code that stitches various software together (bwa-picard-GATK) ? An example or links would be useful.

Many thanks

unit-testing NGS pipeline • 1.2k views
Entering edit mode
3.4 years ago
apeltzer ▴ 150


in general, there is not just one GATK Best Practice. There is about 3-4 at least and different ones for GATK 3.X and now for GATK 4.X.

You are correct, that Unit-Testing typically refers to test single Units of code (e.g. a function, method of a particular tool). Imagine writing a simple tool that calculates the mean of a given list of numbers, then you'd write some unit test to verify that your function does things properly.

I wouldn't call testing several tools that are already developed by others and are stitched together by you "unit testing" but more like "verification testing". There is people doing things like this, by providing simple / small test datasets and then testing the outcome of a developed pipeline automatically using continous integration such as Travis CI/Jenkins/Circle CI:

An example would be here:


They automatically test that new code pieces added to the pipeline do not crash the overall pipeline using such testcases and test-datasets.

Hope that helps!

Entering edit mode

Thank you. So its for GATK 3.7. So if I understand your statement correctly, the verification would just be testing if the paths to the input/output folder is correct, the software generates the expected output for the next step etc. This would be mainly (not limited to) include if-else, echo statements in bash? Again, Travis CI/Jenkins/Circle CI - these are for developers, I assume or can they be used for something as basic as alignment (bwa) to mark dup (picard) to variant calling (gatk 3.7)?

Entering edit mode

No, the Pipeline I put up as an example up there with a link to Github checks more than just paths: The tests that run there with TravisCI run a typical pipeline with some example data and check whether this runs/works well.

You can basically run any kind of testing script with these services - but would need docker or something to ship your tools in an efficient way to the testservers.


Login before adding your answer.

Traffic: 2513 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6