I'd like to share my another practical toolkit, csvtk, after introducing SeqKit yesterday.
- Documents: http://bioinf.shenwei.me/csvtk ( Usage and Tutorial)
- Source code: https://github.com/shenwei356/csvtk
- Latest version:
Similar to FASTA/Q format in field of Bioinformatics, CSV/TSV formats are basic and ubiquitous file formats in both Bioinformatics and data sicence.
People usually use spreadsheet softwares like MS Excel to do process table data. However it's all by clicking and typing, which is not automatically and time-consuming to repeat, especially when we want to apply similar operations with different datasets or purposes.
You can also accomplish some CSV/TSV manipulations using shell commands, but more codes are needed to handle the header line.
csvtk is convenient for rapid data investigation
and also easy to be integrated into analysis pipelines.
It could save you much time of writing Python/R scripts.
- Cross-platform (Linux/Windows/Mac OS X/OpenBSD/FreeBSD)
- Light weight and out-of-the-box, no dependencies, no compilation, no configuration
- Fast, multiple-CPUs supported
- Practical functions supported by N subcommands
- Support STDIN and gziped input/output file, easy being used in pipe
- Most of the subcommands support unselecting fields and fuzzy fields,
-f "-id,-name"for all fields except "id" and "name",
-F -f "a.*"for all fields with prefix "a.".
- Support common plots (see usage)
25 subcommands in total.
statsummary of CSV file
stat2summary of selected number fields
prettyconvert CSV to readable aligned table
csv2tabconvert CSV to tabular format
tab2csvconvert tabular format to CSV
space2tabconvert space delimited format to CSV
transposetranspose CSV data
csv2mdconvert CSV to markdown format
headprint first N records
samplesampling by proportion
cutselect parts of fields
uniqunique data without sorting
freqfrequencies of selected fields
interintersection of multiple files
grepgrep data by selected fields with patterns/regular expressions
filterfilter rows by values of selected fields with artithmetic expression
filter2filter rows by awk-like artithmetic/string expressions
joinjoin multiple CSV files by selected fields
renamerename column names
rename2rename column names by regular expression
replacereplace data of selected fields by regular expression
mutatecreate new columns from selected fields by regular expression
sortsort by selected fields
plot lineline plot and scatter plot
Download and install
Just download compressed
executable file of your operating system, and uncompress it with
tar -zxvf *.tar.gz command.
conda install -c bioconda csvtk