Hi all,
I'd like to share my another practical toolkit, csvtk, after introducing SeqKit yesterday.
- Documents: http://bioinf.shenwei.me/csvtk/ (Usage and Tutorial)
- Source code: https://github.com/shenwei356/csvtk
- Latest version:
Introduction
Similar to FASTA/Q format in field of Bioinformatics, CSV/TSV formats are basic and ubiquitous file formats in both Bioinformatics and data sicence.
People usually use spreadsheet softwares like MS Excel to do process table data. However it's all by clicking and typing, which is not automatically and time-consuming to repeat, especially when we want to apply similar operations with different datasets or purposes.
You can also accomplish some CSV/TSV manipulations using shell commands, but more codes are needed to handle the header line.
csvtk is convenient for rapid data investigation and also easy to be integrated into analysis pipelines. It could save you much time of writing Python/R scripts.
Features
- Cross-platform (Linux/Windows/Mac OS X/OpenBSD/FreeBSD)
- Light weight and out-of-the-box, no dependencies, no compilation, no configuration
- Fast, multiple-CPUs supported
- Practical functions supported by N subcommands
- Support STDIN and gziped input/output file, easy being used in pipe
- Most of the subcommands support unselecting fields and fuzzy fields, e.g.
-f "-id,-name"for all fields except "id" and "name",-F -f "a.*"for all fields with prefix "a.". - Support common plots (see usage)
Subcommands
25 subcommands in total.
Information
headersprint headersstatsummary of CSV filestat2summary of selected number fields
Format conversion
prettyconvert CSV to readable aligned tablecsv2tabconvert CSV to tabular formattab2csvconvert tabular format to CSVspace2tabconvert space delimited format to CSVtransposetranspose CSV datacsv2mdconvert CSV to markdown format
Set operations
headprint first N recordssamplesampling by proportioncutselect parts of fieldsuniqunique data without sortingfreqfrequencies of selected fieldsinterintersection of multiple filesgrepgrep data by selected fields with patterns/regular expressionsfilterfilter rows by values of selected fields with artithmetic expressionfilter2filter rows by awk-like artithmetic/string expressionsjoinjoin multiple CSV files by selected fields
Edit
renamerename column namesrename2rename column names by regular expressionreplacereplace data of selected fields by regular expressionmutatecreate new columns from selected fields by regular expression
Ordering
sortsort by selected fields
Ploting
plotsee usageplot histhistogramplot boxboxplotplot lineline plot and scatter plot
Download and install
csvtk is implemented in Golang programming language, executable binary files for most popular operating systems are freely available in release page.
Just download compressed executable file of your operating system, and uncompress it with tar -zxvf *.tar.gz command.
conda install -c bioconda csvtk
Learn More
- Detailed usage of subcommands
Some answer sovled by csvtk on Biostars
csvtk has 25 subcommands now. Why not give it a try?
One can accomplish some CSV/TSV manipulations using shell commands, but more codes are needed to handle the header line.