Tool:NOEL: An extremely fast Non-Overlapping Exon Length calculator written in Rust
Entering edit mode
4 months ago
alejandrogzi ▴ 120

Hi all!

Introducing the Non-Overlapping Exon Length calculator (NOEL), an extremely fast GTF/GFF per gene exon length extractor written in Rust. See the code and latest updates here: github/alejandrogzi/noel

In case you do not want to read the whole text: NOEL outperforms all open-sourced scripts/tools for this task. It can calculate non-overlapping exon lengths for ~62,000 genes in 3.9 seconds (GRCh38 GENCODE 44 GFF3) and 4 seconds (GRCh38 GENCODE 44 GTF). Additionally, just needs at most ~42 Mb of memory, making it accessible for anyone in any type of laptop/PC without having to restart sessions or have crashed runs. This can be easily attached to a nextflow/snakemake pipeline, other scripts (py, R, C++), run as binary, used as a Rust library, and more.


A week ago I needed to calculate non-overlapping exon lengths, googled some time and found some options (among strictly tools/software or scripts): Kooi 1, Sun 2, and Slowikowski 3,4 scripts, and gtftools (-l flag) 5. I found myself with some problems: missing genes, poor performance, high run times, excessive memory consumption, non CLI-responsive, etc. To maybe help other people with the same goal I had (just quickly calculate non-overlapping exon lengths from a GTF/GFF for any species, easily attached to a pipeline, etc), I develop NOEL (all the information below is part of the github's README):


An extremely fast GTF/GFF per gene Non-Overlapping Exon Length calculator (noel) written in Rust.

enter image description here

Takes in a GTF/GFF file and outputs a .txt file with non-overlapping exon lengths.


Usage: noel[EXE] --i <GTF/GFF> --o <OUTPUT>

    --i <GTF/GFF>: GTF/GFF file
    --o <OUTPUT>: .txt file

    --help: print help
    --version: print version



to install noel on your system follow this steps:

  1. download rust: curl -sSf | sh on unix, or go here for other options
  2. run cargo install noel (make sure ~/.cargo/bin is in your $PATH before running it)
  3. use noel with the required arguments


to build noel from this repo, do:

  1. get rust (as described above)
  2. run git clone && cd noel
  3. run cargo run --release <GTF/GFF> <OUTPUT> (arguments are positional, so you do not need to specify --i/--o)


There are a handful amount of open-sourced tools/software/scripts to calculate non-overlapping exon lengths, namely: Kooi 1, Sun 2, and Slowikowski 3,4 scripts, and gtftools (-l flag) 5. The Non-Overlapping Exon Length calculator (NOEL; referred just as "noel"), is introduced as a novel tool that outperforms the aforementioned software due to its remarkable performance.

To assess the efficiency of noel and test the capabilities of other available scripts/tools, I used run times and memory usage estimates, based on 5 consecutive runs. This evaluation focused on two major gene annotation formats: GTF and GFF. It is worth nothing, however, that only 3 tools are capable of handling GFF files: Slowikowski, Sun* (described below) and noel. Before any batch of runs, I first modified each script to be CLI-responsive. Additionally, I further edited Sun's script to be able to handle GFF inputs by changing a regex pattern. No performance enhance-related changes or breaking structural modifications were applied.

Lastly, to evaluate the output consistency of the top-ranked tools (Sun, gtftools and noel), three species were used: Homo sapiens (GRCh38, GENCODE 44), Canis lupus familiaris (ROS_Cfam_1.0, Ensembl 110), and Mus musculus (GRCm39, GENCODE M33).

enter image description here

The diverse methodologies to calculate non-overlapping exon lengths led to noticeable differences in run times. While Kooi and Slowikowski scripts were the last ranked (>250s for GENCODE 44) with GTF files and Slowikowski only for GFF files (~300s for GENCODE 44); Sun, gtftools and noel were the most efficient options (<50s for GENCODE 44). When analyzing these top-ranked tools, it is quickly perceived the noel's dominance over its competitors. For GTF files, noel achieves noticeably faster computation times when compared to gtftools (x4.3 faster; 4.2s vs 17.9s) and to Sun's script (x10.9 speedup; 4.2s vs 45.7s). On the other hand, noel performs the calculations on GFF3 x12.6 times faster than Sun's script (3.9s vs 49.7s).

enter image description here

A similar pattern is seen when examining memory usage estimates based on GTF files. Three distinct groups of tools can be identified: high-memory-consuming tools (Sun, Slowikowski, and Kooi), tools with moderate memory usage (gtftools), and the most memory-efficient option (noel). Here, noel exhibited a significantly lower memory usage when compared to gtftools (x9.1 less; 42.9 Mb vs 391.8 Mb) and to Kooi (x73.1 less; 42.9 Mb vs 3.1 Gb). With GFF files, on the other hand, noel achieved a striking x146.1-fold reduction in memory usage compared to Slowikowski (62,700 genes).

enter image description here

The comparison of output from the top-ranked tools, including Sun, gtftools, and noel, yielded consistently paired estimates for each species, resulting in a high correlation (R = 0.99). Notably, both noel and Sun's script demonstrated a one-to-one correspondence for every gene in all tested annotation models. In contrast, gtftools exhibited limitations in processing genes, with a slight deficiency in the human and mouse models (0.05% and 0.06%, respectively), and a more substantial shortfall in the dog model (26%). Furthermore, noel outperformed the other tools, significantly improving runtime efficiency in both the mouse and dog models, with a speedup of at least 2.3 times.

Based on this comparative analysis between existing scripts/software to calculate non-overlapping exonic lengths and noel, it is evident that this tool represents a significant improvement. These findings unveil the potential of noel as a valuable resource to provide a fast and efficient way to automate non-overlapping exon length calculations.

Hope this helps someone!

gene-annotation exon-length • 396 views

Login before adding your answer.

Traffic: 1519 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6