Question: Audit trail for Bioinformatics software tools
gravatar for kspata
10 weeks ago by
kspata40 wrote:


I analyze several samples every day for variant analysis using align to reference method. For this purpose I use different Bioinformatics software such as Bowtie2/BWA, Samtools, and Freebayes. Is there a way in which I can know which version of software was used to process a particular sample. This should work like an audit trail, informing say Sample1 was aligned using bowtie2 vX.X.X, Sample2 was analysed using Bowtie2 vX.X.Y, and so on.

For example

bowtie2 --version command gives the output of Bowtie2 installed on the system as follows:

/usr/local/bin/bowtie2-align-s version 2.2.2

another approach,

samtools view -H sample.sorted.bam

@HD VN:1.0 SO:coordinate @SQ SN: reference LN: @PG ID:bowtie2 PN:bowtie2 VN:2.2.2 CL:"/usr/local/bin/bowtie2-align-s --wrapper basic-0 -x -I 0 -X 1000 --fr -p 16 --local --passthrough -1 /tmp/40466.inpipe1 -2 /tmp/40466.inpipe2"

Both these commands do not tell that Sample1 was processed using bowtie2, Sample2 was processed using Bowtie2 and so on.

I would like to get an audit trail, where I will know for each software which version was used to process which sample.


ADD COMMENTlink modified 8 weeks ago by Biostar ♦♦ 20 • written 10 weeks ago by kspata40

You could capture this information (bowtie2 --version) in your analysis master logs for projects. Unix command script can capture all interactive dialog from a terminal sessions. Standard error and standard output logs captured from the analysis should include this information and can be saved.

You could also use a workflow system like snakemake to capture/automate your interactions and log those actions.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by genomax64k

Indeed, for audit trails in corporate and clinical settings, I produce a log for each sample that looks something like:

Beginning analysis script on Wed  6 Sep 11:56:47 UTC 2017, run by KevinBlighe with the following parameters:
    1   /home/ubuntu/Placa4.tmp/71/Files/71_S2_L001_R1_001.fastq.gz
    2   /home/ubuntu/Placa4.tmp/71/Files/71_S2_L001_R2_001.fastq.gz
    3   /home/ubuntu/reference/hg38.fasta
    4   Placa4
    5   GNT071
    6   /home/ubuntu/pipeline/BED/Versao1_Sorted.hg38.bed
    7   NULL
    8   GNT081
    9   1.333333
    10  0.666667
    11  20
    12  70
    13  illumina
    14  18
    15  50
    16  relaxed
    17  /home/ubuntu/pipeline/validation/Full/
    18  KevinBlighe
Beginning analysis step 1 (adaptor and read quality trimming) on Wed  6 Sep 11:56:47 UTC 2017
Beginning analysis step 2 (alignment) on Wed  6 Sep 11:57:20 UTC 2017
Beginning analysis step 3 (marking and removing PCR duplicates) on Wed  6 Sep 11:58:13 UTC 2017
Beginning analysis step 4 (remove low mapping quality reads) on Wed  6 Sep 11:58:28 UTC 2017
Beginning analysis step 5 (QC) on Wed  6 Sep 11:58:31 UTC 2017
Beginning analysis step 6 (downsampling / random read sampling) on Wed  6 Sep 11:58:46 UTC 2017
Beginning analysis step 7 (variant calling) on Wed  6 Sep 11:58:53 UTC 2017
Beginning analysis step 8 (annotation) on Wed  6 Sep 12:00:26 UTC 2017
Skipping analysis step 9 (PCR results and CNV analysis) - no results file provided
Beginning analysis step 10 (customising VCF for haplotype identification) on Wed  6 Sep 12:03:02 UTC 2017
Beginning post-analysis tidy-up on Wed  6 Sep 12:03:02 UTC 2017
Analysis script finished on Wed  6 Sep 12:03:02 UTC 2017

Versions of the programs that are used are stored elsewhere, and there is also a standard operating procedure, which is versioned and has date for next review.

ADD REPLYlink written 8 weeks ago by Kevin Blighe39k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 847 users visited in the last hour