Tool: FASTA and FASTQ tools
gravatar for Kamil
5.3 years ago by
Kamil2.0k wrote:

Many developers have created tools for manipulating FASTA and FASTQ files. This is a comprehensive list of all the publicly available projects:


    • BBTools is a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data. BBTools can handle common sequencing file formats such as fastq, fasta, sam, scarf, fasta+qual, compressed or raw, with autodetection of quality encoding and interleaving. It is written in Java and works on any platform supporting Java, including Linux, MacOS, and Microsoft Windows and Linux; there are no dependencies other than Java (version 7 or higher). Program descriptions and options are shown when running the shell scripts with no parameters.


    • SeqKit is a cross-platform, ultrafast, and practical FASTA/Q manipulations tool that is friendly for researchers to complete wide ranges of FASTA/Q file processing. The toolkit supports plain or gzip-compressed input and output from either standard stream or files, therefore, it could be easily used in command-line pipe.


    • Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip.
    • This package provides a number of small and efficient programs to perform common tasks with high throughput sequencing data in the FASTQ format. All of the programs work with typical FASTQ files as well as gzipped FASTQ files.
    • Bioawk is an extension to Brian Kernighan's awk, adding the support of several common biological data formats, including optionally gzip'ed BED, GFF, SAM, VCF, FASTA/Q and TAB-delimited formats with column names. It also adds a few built-in functions and an command line option to use TAB as the input/output delimiter. When the new functionality is not used, bioawk is intended to behave exactly the same as the original BWK awk.
    • The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
    • fqtools is a software suite for fast processing of FASTQ files.



    • The Biopieces are a collection of bioinformatics tools that can be pieced together in a very easy and flexible manner to perform both simple and complex tasks. The Biopieces work on a data stream in such a way that the data stream can be passed through several different Biopieces, each performing one specific task.
    • The Bio::ToolBox libraries provide an abstraction layer over a variety of different specialized BioPerl-style modules. For example, there is a special emphasis on the collection data values for defined genomic coordinate regions, regardless of whether the values come from a GFF
      database, Bam file, BigWig file, etc.
    • Command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc. Primarily written to support an Illumina based pipeline - but should work with any FASTQs.


    • Manipulate FASTA files.


    • The FAST Analysis of Sequences Toolbox (FAST) is a set of Unix tools (for example fasgrep, fascut, fashead and fastr) for sequence bioinformatics modeled after the Unix textutils (such as grep, cut, head, tr, etc). FAST workflows are designed for "inline" (serial) processing of flatfile biological sequence record databases per-sequence, rather than per-line, through Unix command pipelines. The default data exchange format is multifasta (specifically, a restriction of BioPerl FastA format). FAST tools expose the power of Perl and BioPerl for sequence analysis to non-programmers in an easy-to-learn command-line paradigm.

tool c++ fastq python fasta • 8.5k views
ADD COMMENTlink modified 3.5 years ago • written 5.3 years ago by Kamil2.0k

Also, fastx-toolkit, bioawk, and a gazillion other tools - it's crazy how many of these are around!

ADD REPLYlink modified 5.3 years ago • written 5.3 years ago by RamRS30k

And FAST (perl). One is bound to fail when taking up such a task.

ADD REPLYlink written 5.3 years ago by h.mon31k

Java - BBMap needs to be added to this list.

ADD REPLYlink written 4.1 years ago by genomax90k

I'm trying to get bioawk on Ubuntu using "sudo apt-get install bioawk", but it says it can't find a package named bioawk. How could I install this?

ADD REPLYlink written 4.1 years ago by beneficii60

You can download (or use git clone the code and then go into bioawk-master folder and type make. That will compile the program. You can then copy the bioawk executable to a directory in your $PATH (/usr/local/bin should work).

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by genomax90k
gravatar for Tariq Daouda
5.3 years ago by
Tariq Daouda210
IRIC | Institute for Research in Immunology and Cancer
Tariq Daouda210 wrote:


There's also the the parsers module of pyGeno. It supports: FASTA,  FASTQ, VCF, GTF and CSV files. With an emphasis on simple and convenient interfaces.

ADD COMMENTlink modified 5.3 years ago • written 5.3 years ago by Tariq Daouda210
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1629 users visited in the last hour