Metagenomics is a hot area of scientific research. It is now extensively used in ecology, biofuel production, agriculture, human andanimal health. Nevertheless, with opportunities come new challenges: in particular, data processing becomes an increasingly time consuming and computationally expensive. The most "expensive" (time and compute-wise) taskis to assign taxonomic labels to metagenomics DNA sequences. Recent bioinformatics advances try to overcome this problem by enabling efficient algorithm. Here we present one of the most recent software for metagenomics data analysis - CLARK. Compared to existing solution, it is more than 5 times faster, yet for large datasets requires powerful machine and is not available on Windows. In this tutorial we show basic usage of the tool with InsideDNA. In one of the next tutorials will demonstrate a full pipeline for bacterial, viral and human metagenomics data analysis.
CLARK is a novel bioinformatics tool for fast and accurate sequence classification. It can be useful for analysis of metagenomics and genomic datasets. In its own terminology, CLARK requires following datasets as input:
- objects - fasta or fastq files with metagenomics reads to be classified
- targets - fasta files with reference sequences, e.g. genome assemblies of known species or individuals
- a simple text file listing all targets and labeling these targets into discrete groups, e.g. species, genera, families, etc.
With this information at hand, objects (i.e. newly sequenced reads) can be classified according to thereferences into respective groups (such as species, genera, families, etc).
What is labeling in CLARK? Know more here