How to get summary statistics out of bbduk.sh ?
1
0
Entering edit mode
2 days ago

I am using bbduk to quality trim my raw sequencing reads but would like to get more summary statistics out of it than the default. I can see many options for pre-made histograms of the statistics I would like but not to get the raw numbers as I would rather the generate the plots myself! Is there a way to do this that I am missing?

quallity sequencing control • 129 views
1
Entering edit mode
2 days ago
GenoMax 107k

Capture the stderr stream from your bbduk jobs. You should get the following

Input:                          12202091 reads          610104550 bases.
KTrimmed:                       9227056 reads (75.62%)  254394193 bases (41.70%)
Total Removed:                  1328490 reads (10.89%)  254394193 bases (41.70%)
Result:                         10873601 reads (89.11%)         355710357 bases (58.30%)

0
Entering edit mode

I am capturing the stderr which does give some nice stats! But was wanting more detailed stats like those indicated in the help:

Histogram output parameters:
bhist=<file>        Base composition histogram by position.
qhist=<file>        Quality histogram by position.
qchist=<file>       Count of bases with each quality value.
aqhist=<file>       Histogram of average read quality.
bqhist=<file>       Quality histogram designed for box plots.
phist=<file>        Polymer length histogram.
ihist=<file>        Insert size histogram, for paired reads in mapped sam.
gcbins=100          Number gchist bins.  Set to 'auto' to use read length.
maxhistlen=6000     Set an upper bound for histogram lengths; higher uses
more memory.  The default is 6000 for some histograms
and 80000 for others.

0
Entering edit mode

That provide a file name for whichever plot/stat you want on the command line. e.g. bhist=myhist

0
Entering edit mode

Ohhhh they are text based histograms okay! Sorry I assumed they were pre-compiled plots.