Tool: Quick statistics from the command line
17
gravatar for Kamil
4.2 years ago by
Kamil1.9k
Boston
Kamil1.9k wrote:

This is a list of some popular tools for quickly computing statistics on the command-line.

For example, here's what you can do with GNU Datamash on the command-line:

# Example: calculate the sum and mean of values 1 to 10:
$ seq 10 | datamash sum 1 mean 1
55 5.5

 


C

  • https://github.com/tonyfischetti/qstats
    • Quick and dirty statistics tool for the UNIX pipeline
       
  • https://github.com/agordon/datamash
    • GNU Datamash is a command-line program which performs basic
      numeric, textual and statistical operations on input textual data files.

      it is designed to be portable and reliable, and aid researchers
      to easily automate analysis pipelines, without writing code or even
      short scripts.

C++

  • https://github.com/dpmcmlxxvi/clistats
    • clistats is a command line interface tool to compute statistics of a set of delimited input numbers from a stream such as a Comma Separated Value (.csv) or Tab Separated Value (.tsv) file. The default delimiter is a comma. Input data can be a file, a redirected pipe, or the manually entered at the console. To stop processing and display the statistics during manual input, enter the EOF signal (CTRL-D on POSIX systems like Linux or Cygwin or CTRL-Z on Windows).
       
  • https://github.com/simonccarter/sta
    • This is a lightweight, fast tool for calculating basic descriptive statistics from the command line. Inspired by https://github.com/nferraz/st, this project differs in that it is written in C++, allowing for faster computation of statistics given larger non-trivial data sets.

      Additions include the choice of biased vs unbiased estimators and the option to use the compensated variant algorithm.

      Given a file of 1,000,000 ascending numbers, a simple test on a 2.5GHz dual-core MacBook using Bash time showed sta takes less than a second to complete, compared to 14 seconds using st.

  • https://github.com/arq5x/filo

    • Useful FILe and stream Operations

      • groupBy is a useful tool that mimics the "groupBy" clause in database systems. Given a file or stream that is sorted by the appropriate "grouping columns", groupBy will compute summary statistics on another column in the file or stream. This will work with output from all BEDTools as well as any other tab-delimited file or stream.

      • shuffle will randomize the order of lines in a file. In other words, if you have a sorted file, shuffle will undo the sort.

      • stats is a small utility for computing descriptive statistic on a given column of a tab-delimited file or stream. By default, it will assume you want to gather stats on the first column in your file/stream

Go

Perl

statistics tool C • 2.8k views
ADD COMMENTlink modified 4.2 years ago by Istvan Albert ♦♦ 81k • written 4.2 years ago by Kamil1.9k
1

wow. These are great. Thanks for sharing the links.

ADD REPLYlink written 4.2 years ago by poisonAlien2.8k
1

That's great!

ADD REPLYlink written 4.2 years ago by Deepak Tanwar4.0k
6
gravatar for Istvan Albert
4.2 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

I've also used 

CSVKit: a suite of utilities for converting to and working with CSV

 

https://csvkit.readthedocs.org

 

ADD COMMENTlink written 4.2 years ago by Istvan Albert ♦♦ 81k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2053 users visited in the last hour