BBTools is a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data. BBTools can handle common sequencing file formats such as fastq, fasta, sam, scarf, fasta+qual, compressed or raw, with autodetection of quality encoding and interleaving. It is written in Java and works on any platform supporting Java, including Linux, MacOS, and Microsoft Windows and Linux; there are no dependencies other than Java (version 7 or higher). Program descriptions and options are shown when running the shell scripts with no parameters.
SeqKit is a cross-platform, ultrafast, and practical FASTA/Q manipulations tool that is friendly for researchers to complete wide ranges of FASTA/Q file processing. The toolkit supports plain or gzip-compressed input and output from either standard stream or files, therefore, it could be easily used in command-line pipe.
This package provides a number of small and efficient programs to perform common tasks with high throughput sequencing data in the FASTQ format. All of the programs work with typical FASTQ files as well as gzipped FASTQ files.
Bioawk is an extension to Brian Kernighan's awk, adding the support of several common biological data formats, including optionally gzip'ed BED, GFF, SAM, VCF, FASTA/Q and TAB-delimited formats with column names. It also adds a few built-in functions and an command line option to use TAB as the input/output delimiter. When the new functionality is not used, bioawk is intended to behave exactly the same as the original BWK awk.
Seqmagick is a kickass little utility built in the spirit of imagemagick to expose the file format conversion in Biopython in a convenient way. Instead of having a big mess of scripts, there is one that takes arguments.
The Biopieces are a collection of bioinformatics tools that can be pieced together in a very easy and flexible manner to perform both simple and complex tasks. The Biopieces work on a data stream in such a way that the data stream can be passed through several different Biopieces, each performing one specific task.
The Bio::ToolBox libraries provide an abstraction layer over a variety of different specialized BioPerl-style modules. For example, there is a special emphasis on the collection data values for defined genomic coordinate regions, regardless of whether the values come from a GFF database, Bam file, BigWig file, etc.
The FAST Analysis of Sequences Toolbox (FAST) is a set of Unix tools (for example fasgrep, fascut, fashead and fastr) for sequence bioinformatics modeled after the Unix textutils (such as grep, cut, head, tr, etc). FAST workflows are designed for "inline" (serial) processing of flatfile biological sequence record databases per-sequence, rather than per-line, through Unix command pipelines. The default data exchange format is multifasta (specifically, a restriction of BioPerl FastA format). FAST tools expose the power of Perl and BioPerl for sequence analysis to non-programmers in an easy-to-learn command-line paradigm.