Tool: TRAL - Tandem Repeat Annotation Library
3
gravatar for Elke Schaper
5.1 years ago by
Elke Schaper60
Switzerland
Elke Schaper60 wrote:

 

For my Ph.D., I've implemented solutions to a large number of tasks related to sequence tandem repeats. We've now decided to make the code accessible and reusable for others, hoping that it's going to safe a lot of time for some of you!

 

Features

  • Detect nucleic or protein tandem repeats with de novo software. TRAL can be used to run, parse, merge and output results from external tandem repeat detection tools in an output format of choice.
  • Detect tandem repeats from a sequence profile HMMs. In case you already know the sequence of your tandem repeat more or less, but are interested in either refining the annotation (e.g. if some repeat units are missing from the annotation) or search for homologous tandem repeats in other sequences.
  • Statistical significance analysis of putative tandem repeats. We and others have found that specificity is a big issue with many tandem repeat annotation tools. To make sure you can trust your tandem repeat annotations, TRAL ships with ad hoc and model-based statistical tests for nucleic and protein tandem repeats. Using these tests, each tandem repeat is tagged with a p-value, and you can decide the threshold.
  • Overlap detection and filtering. When you merge tandem repeat annotations from several sources, you may want to discard overlapping repeats. Several definitions of overlap are implemented in TRAL.
  • new Reconstruct tandem repeat unit phylogenies.

 

Technical details

 

Tutorials

  • Extensive tutorials are available on GithubIO. Please mail me if you wish for a tutorial for a specific task!

 

Example

This is a short example of how you can annotate your sequences with TRF in three lines of code:

#Python3
from tral.sequence import sequence
sequences = sequence.Sequence.create(file = "path/to/my/sequences.fa", input_format = 'fasta', sequence_type = "DNA")
tandem_repeats = [i_seq.detect(denovo = True, detection = {"detectors": ["TRF"]}) for i_seq in sequences]

More examples are available in the docs.

 

Your feedback - every comment is helpful!

If you believe TRAL might help your research or save you time, please feel free to contact me, or post the project.

  • Feature requests
  • How to implement specific tasks
  • Bug reports

 

Publications

Here's to some background of TRAL:

TRAL E Schaper, A Korsunsky, J Pecerska, A Messina, R Murri, H Stockinger, S Zoller, I Xenarios, and M Anisimova (2015). TRAL: Tandem Repeat Annotation Library. Bioinformatics. DOI: 10.1093/bioinformatics/btv306
Statistical testing of tandem repeats, benchmark of tandem repeat annotation tools E Schaper, AV Kajava, A Hauser & M Anisimova (2012). Repeat or not repeat?—Statistical validation of tandem repeat prediction in genomic sequences. NAR. DOI: 10.1093/nar/gks726
Phylogenetic analysis of tandem repeat unit evolution E Schaper, O Gascuel & M Anisimova (2014). Deep conservation of human protein tandem repeats within the eukaryotes. MBE. DOI: 10.1093/molbev/msu062
Short intro to some computational issues with tandem repeats M Anisimova, J Pecerska, E Schaper (2015). Statistical approaches to detecting and analyzing tandem repeats in genomic sequences. Frontiers in Bioengineering and Biotechnology. DOI: 10.3389/fbioe.2015.00031

 

 

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Elke Schaper60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1696 users visited in the last hour