Tool:Hera: A new tool for RNA-Seq analysis
3
26
Entering edit mode
5.1 years ago
sonpham ▴ 580

Hera version 1.2 is now released (Nov 22, 2017).

Today we released Hera version 1.2 that:

1. improves the running time performance (reduces 50% running time) and accuracy. Its speed now is similar (or even faster) than tools using pseudo-alignment (or alignment-free) approaches.
2. fixes some bugs and usage arguments.
3. changes the structure of the index (index file must be rebuilt for this version).
4. Turns off the fusion detection feature in this version to improve its performance in future releases.

===============

We are happy to release Hera, a fast and accurate algorithm that maps spliced RNA-seq reads to a genome while simultaneously estimates transcript abundances, detects gene fusions, and outputs alignment files for visualizing and variant calling purposes.

In the same period of time for STAR to output a SAM alignment, Hera is capable of outputting a BAM file (with base-to-base alignment), transcript quantification (in TPM), and a list of fusion genes.

Hera quantification algorithm obtained the best ranking in a recent round of the SMC-RNA DREAM challenge: https://www.synapse.org/#!Synapse:syn2813589/wiki/423306

RUN: Running Hera is simple: hera quant -i index/ -t 32 read1.fastq read2.fastq

[OPTIONAL]:

• -o [output directory] (default: ./)
• -t [number of running threads] (default: 1)
• -o [output directory] (default: ./)
• -t [number of running threads] (default: 1)
• -z [level of bam file compression (1 - 9)] (default: -1)
• -b [Number of boostrap] (default: 1)
• -w [Output bam file 0: true, 1: false] (defaut: 0)
• -f [Genome fasta file]

VERSIONS:

Version 1.1

1. Striped Smith-Waterman alignment and some technical improvements in the EM implementations allow Hera to reduce 30% of its running time. With 30 bootstraps on 60M reads with 32-core CPU, Hera 1.1 takes 2m45s, even faster than other pseudo-alignment approaches for transcript quantification.

2.In this 1.1 release, we have fixed all of your reported bugs in Hera 1.0.

Version 1.0

Hera version 1.0 is available for both academics and industry labs. Its source code is released under the MIT license. For more information: https://github.com/bioturing/hera

For feedbacks, please use the GitHub tracking issues, or just send an email to sonpham@bioturing.com

Thank you!

Son Pham & BioTuring Algorithm Team.

RNA-Seq gene fusion kallisto STAR Tool • 6.8k views
4
Entering edit mode

hera is now available in Bioconda for easy installation.

0
Entering edit mode

Excellent !!! Thank Andreas!

1
Entering edit mode

Sounds good! Transcript quantification is in TPM I see, but is it also possible to get raw counts to use "our favourite normalization approach"?

2
Entering edit mode

Yes, Hera also outputs raw counts so that you can use your favourite normalization approach, or use for differential expression calling using edgeR, deseq2, etc. Hera also supports Sleuth (developed by Pachter's group), and the bootstrap outputs are stored in the .h5 files.

0
Entering edit mode

That's just great, thanks!

0
Entering edit mode

Hi,

Hera does not output .h5 for me. I've got 4 files, summary, 2 x abundance and a bam file.

What is an .h5 file?

Best,

C.

3
Entering edit mode

Hi cristian,

It may because you did not defined number of bootstraps for running. We use .h5 file to store results at boostrapping step. You can add flag -b number_of_bootstrap when running hera and check if it can outputs .h5 file now.

Best,

Bioturing Algorithm Team.

1
Entering edit mode

h5 files are HDF-formatted files. Generally have been used for PacBio data.

1
Entering edit mode

And Oxford Nanopore sequencing data, where the format is called 'fast5'.

1
Entering edit mode

These are patient data-sets and am afraid this cannot be available outside facility. However, we also tested Hera on the Edgren et al. dataset and am unable to find the fusions reported in the paper.

Data is publicly available: The raw sequencing data have been deposited in the NCBI Sequence Read Archive [SRA:SRP003186].

Thank you for looking into this. Teja

0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

0
Entering edit mode

What is the accuracy and precision of Hera compared to STAR on simulated sets with short anchors and indels? Thanks.

0
Entering edit mode

Looks great!

Can Hera be adapted to work on de novo transcriptomes for species with no previously sequenced genomes?

Cheers,

J

0
Entering edit mode

hi J,

Not yet! Hopefully, this will be supported in future versions.

Bioturing Algorithm Team

5
Entering edit mode
5.1 years ago

Neat! There must be some witchcraft under the hood for it to be doing all those functions in a third of the time or less as the current industry standards. Can it handle matched-normal datasets, or running a tumor sample against a panel of normals? Also, do you have data other than speed that would compel me to switch over from current well-cited and high-scoring programs like RSEM and FusionCaller that perform these tasks? I guess all that would go into a paper, but is currently published in the DREAM challenge rankings, though.

2
Entering edit mode

Hi AcademicDialysis, Thank you for your suggestions! These new features (matched normal-tumor samples) may be supported in future releases. We will put the manuscript to BioRxiv soon.

0
Entering edit mode
5.0 years ago
y.divyatej ▴ 10

Ive tried running Hera on a cancer sample with canonical fusions and it hasnt found it. Would trimming the reads or changing the parameters yeild better results?

0
Entering edit mode

You may want to create a new thread with this question so this announcement thread does not become a support one.

0
Entering edit mode

Hi y.divyatej, Is it possible to share with us the 'canonical fusion' dataset so that we have a try? Thanks!

0
Entering edit mode

These are patient data-sets and am afraid this cannot be available outside facility. However, we also tested Hera on the Edgren et al. dataset and am unable to find the fusions reported in the paper.

Data is publicly available: The raw sequencing data have been deposited in the NCBI Sequence Read Archive [SRA:SRP003186].

Thank you for looking into this. Teja

0
Entering edit mode

We will look into this!