Tool: csvtk - a cross-platform, efficient, practical and pretty CSV/TSV toolkit
8
gravatar for shenwei356
22 months ago by
shenwei3564.1k
China
shenwei3564.1k wrote:

Hi all,

I'd like to share my another practical toolkit, csvtk, after introducing SeqKit yesterday.

Introduction

Similar to FASTA/Q format in field of Bioinformatics, CSV/TSV formats are basic and ubiquitous file formats in both Bioinformatics and data sicence.

People usually use spreadsheet softwares like MS Excel to do process table data. However it's all by clicking and typing, which is not automatically and time-consuming to repeat, especially when we want to apply similar operations with different datasets or purposes.

You can also accomplish some CSV/TSV manipulations using shell commands, but more codes are needed to handle the header line.

csvtk is convenient for rapid data investigation and also easy to be integrated into analysis pipelines. It could save you much time of writing Python/R scripts.

Features

  • Cross-platform (Linux/Windows/Mac OS X/OpenBSD/FreeBSD)
  • Light weight and out-of-the-box, no dependencies, no compilation, no configuration
  • Fast, multiple-CPUs supported
  • Practical functions supported by N subcommands
  • Support STDIN and gziped input/output file, easy being used in pipe
  • Most of the subcommands support unselecting fields and fuzzy fields, e.g. -f "-id,-name" for all fields except "id" and "name", -F -f "a.*" for all fields with prefix "a.".
  • Support common plots (see usage)

Subcommands

25 subcommands in total.

Information

  • headers print headers
  • stat summary of CSV file
  • stat2 summary of selected number fields

Format conversion

  • pretty convert CSV to readable aligned table
  • csv2tab convert CSV to tabular format
  • tab2csv convert tabular format to CSV
  • space2tab convert space delimited format to CSV
  • transpose transpose CSV data
  • csv2md convert CSV to markdown format

Set operations

  • head print first N records
  • sample sampling by proportion
  • cut select parts of fields
  • uniq unique data without sorting
  • freq frequencies of selected fields
  • inter intersection of multiple files
  • grep grep data by selected fields with patterns/regular expressions
  • filter filter rows by values of selected fields with artithmetic expression
  • filter2 filter rows by awk-like artithmetic/string expressions
  • join join multiple CSV files by selected fields

Edit

  • rename rename column names
  • rename2 rename column names by regular expression
  • replace replace data of selected fields by regular expression
  • mutate create new columns from selected fields by regular expression

Ordering

  • sort sort by selected fields

Ploting

  • plot see usage
    • plot hist histogram
    • plot box boxplot
    • plot line line plot and scatter plot

Download and install

csvtk is implemented in Golang programming language, executable binary files for most popular operating systems are freely available in release page.

Just download compressed executable file of your operating system, and uncompress it with tar -zxvf *.tar.gz command.

Or install via conda Install-with-conda Anaconda Cloud downloads

conda install -c bioconda csvtk

Learn More

tsv tool tab-delimited csv golang • 1.3k views
ADD COMMENTlink modified 17 months ago by Ram17k • written 22 months ago by shenwei3564.1k

csvtk has 25 subcommands now. Why not give it a try?

One can accomplish some CSV/TSV manipulations using shell commands, but more codes are needed to handle the header line.

ADD REPLYlink modified 17 months ago • written 17 months ago by shenwei3564.1k
1
gravatar for Ram
17 months ago by
Ram17k
Houston, TX
Ram17k wrote:

This will be super useful. What would it take to push it to homebrew?

ADD COMMENTlink written 17 months ago by Ram17k

Someone called this too, I may push it when I'm free. But I use Linux and I'm not familiar with Mac OS X. That's would be great if someone pushes it.

ADD REPLYlink modified 17 months ago • written 17 months ago by shenwei3564.1k

Let's discuss this on Twitter soon - I'm on vacation now. I think the author cannot recommend their own tool, so we can check out the procedure later.

ADD REPLYlink written 17 months ago by Ram17k
1

FYI, versions 0.4.4 thru 0.7.0 are available for Linux & OSX via bioconda

ADD REPLYlink written 17 months ago by Ryan Dale4.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1444 users visited in the last hour