Tool:CATS: reference-free and reference-based transcriptome assembly quality assessment
0
3
Entering edit mode
7 weeks ago
Kristian ▴ 30

Hi everyone!

I would like to introduce CATS (Comprehensive Assessment of Transcript Sequences) - a novel framework for transcriptome assembly quality evaluation. CATS features two complementary tools:

CATS-rf (reference-free)

Evaluates transcriptome assemblies utilizing only the RNA-seq reads employed during assembly construction. The pipeline maps reads to the assembled transcripts and analyzes mapping patterns to detect misassemblies

The main quality metric generated by CATS-rf is the transcript quality score, calculated by integrating several metrics each targeting specific types of assembly errors. Transcript scores are normalized to a range between 0 and 1, where higher values indicate better quality. The assembly score is calculated as the average of all transcript scores within the assembly, representing an overall estimate of assembly accuracy

Alongside transcript scores, CATS-rf outputs and visualizes a wide range of metrics to support in-depth assembly diagnostics and quality assessment. The example HTML output can be found here

CATS-rf requires the evaluated transcriptome assembly in FASTA format, along with short RNA-seq reads used during assembly in either FASTQ or FASTA format. CATS-rf supports both paired-end and single-end library configurations

CATS-rb (reference-based)

CATS-rb is a tool for assessing the quality of transcriptome assemblies by aligning assembled transcripts to a reference genome. It supports both relative and annotation-based scoring methods. The tool introduces a novel framework for completeness analysis using non-redundant exon and transcript sets derived from genomic coordinates. It offers two complementary modes of evaluation:

Relative Completeness Analysis

• Requires two or more transcriptome assemblies

• Completeness is measured relative to the other assemblies, without external annotation

• Suitable for benchmarking multiple assemblies against each other in the absence of a trusted reference

Annotation-Based Completeness Analysis

• Requires one or more transcriptome assemblies and a reference gene annotation (in GTF or GFF3 format)

• Completeness is assessed by comparing assemblies to the known reference gene models

Exon and transcript sets are defined by collapsing overlapping exon and transcript genomic coordinates of a given assembly, respectively. Each set is assigned a completeness score ranging from 0 to 1, where 0 indicates the set is entirely absent from the assembly and 1 indicates the set is fully recovered

These per-set scores are then averaged to compute complementary assembly-level completeness metrics:

Exon score: Reflects the overall completeness of exonic regions in an assembly

Transcript Score: Reflects the recovery of full-scale transcript models

CATS-rb also offers a comprehensive visualization of several quality estimates, along with Venn diagrams, UpSet plots, and hierarchical clustering heatmaps of exon and transcript sets. The example HTML output file can be found here

The minimal run of CATS-rb requires the evaluated transcriptome assembly in FASTA format, along with the reference genome of the corresponding or a closely related species in FASTA format. As mentioned above, the run can be supplemented with the GTF/GFF3 gene annotation file

Benchmarking

CATS-rf outperforms existing reference-free tools in accuracy and error detection

CATS-rb scores strongly correlate with assembly quality and enable precise assembly evaluation without external genomic annotation, facilitating the analysis of non-model organisms and unannotated transcripts

For detailed benchmarks please refer to the CATS preprint

Installation

CATS is implemented in R and Bash. The source code and detailed manuals are available on the following GitHub repositories:

CATS-rf

CATS-rb

Both tools are distributed under the MIT license and are also available via Bioconda for Linux and MacOS:

conda install -c bioconda cats-rf
conda install -c bioconda cats-rb
tool rna-seq transcriptomics transcriptome • 429 views
ADD COMMENT

Login before adding your answer.

Traffic: 6086 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6