Tool:FAST: FAST Analysis of Sequences Toolbox
1
7
Entering edit mode
7.7 years ago
Peter Becich ▴ 70

FAST (FAST Analysis of Sequences Toolbox), built on BioPerl, provides open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can facilitate better documentation and reproducibility of bioinformatic protocols, supporting better transparency in big biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R and GenBank, help make FAST easy to learn. FAST automates numerical, text-based, sequence-based and taxonomic searching, sorting, selection and transformation of sequence records and alignment sites based on indices, ranges, tags and feature annotations, and analytics for composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics makes FAST useful for molecular evolutionary analysis. FAST is portable, easy to install, and secure, with stable releases posted to CPAN and development on Github.

The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format). Sanger and Illumina 1.8+ FASTQ formatted files are also supported. The command-line basis of FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought.

FAST can be found in the CPAN (https://metacpan.org/pod/FAST). Contributions are welcomed and appreciated (https://github.com/tlawrence3/FAST).

See the FAST Cookbook for examples (https://github.com/tlawrence3/FAST/blob/master/doc/FAST_Cookbook.org) or Lawrence et al. "FAST: FAST Analysis of Sequences Toolbox", to appear (https://github.com/tlawrence3/FAST/blob/master/paper/Lawrence_etal_FAST.pdf).

NCBI-Taxonomy pipeline multiFASTA BioPerl • 3.5k views
0
Entering edit mode

This is pretty neat - I am looking for fasta processing tools to integrate as online tools - like this: http://test.biostars.org/charms/ I'll look into integrating it with your library.

1
Entering edit mode

Why not implement most functionality in javascript? For some operations, modern browsers can be 100x faster than bioperl.

0
Entering edit mode

Low level constructs are fast in javascript but file parsing and other IO may not be all that efficient (though I have not tested that). Then I also I don't really trust that javascript would perform well once the data get larger.

I will also admit that for reasons I can't properly identify yet programming in javascript does not feel fun for me. It is a bit like R where once I run into ugly constructs I say to myself I can't believe this language is making me do that. Hence I'd prefer to use javascript as the minimal glue that is necessary to connect to a more "sane" language.

1
Entering edit mode

I am not sure about the purpose of Charms. You were talking about large-scale data. If you want to provide a service for short-read mapping against human genome, javascript alone is certainly not up to the task. Nonetheless, for many tasks implemented in perl/python scripts, javascript is faster, often a lot. This is also true for file parsing in my experience (javascript doesn't have file I/O by itself). In addition, when you use javascript as an interface to other tools, you need to pay extra cost for the interface and on the server. Native javascript code runs on the client side. It is much faster and cleaner that way.

It is probably better to move most simple tasks to javascript, and only leave heavy-duty applications (e.g. mapping against human genome) and complex packages on the server. Yes, javascript has some confusing features, but it also has a unique place (web development) and some outstanding advantages few other languages have (e.g. mature high-performance implementation). It is the single best choice sometimes. If you really hate the syntax of javascript, you may consider coffeescript/typescript, though I haven't tried it myself.

1
Entering edit mode

yes I might have confounded two different issues here - among the reasons that I don't use javascript is that I don't know it would hold up in realistic data scenarios - but that is not relevant to charms since those are not supposed to be used as a large scale data analysis platform.

long terms I see charms as a platform to demonstrate, teach and show concepts through the web. There will be "official" charms available by default but at the same time any user will be able to create their own library and others will have the option of loading these for their own use with one click.

Within their own library users may implement everything in javascript (that is the simplest approach) or they can choose to connect to a RPC or HTTP service (on a different machine) via a simple API. When choosing these latter the Biostar server will act as a proxy (since javacript cannot connect to different domains) and submits the data to the user's server, once the result is back displays it to the user. The returned data may be a rich HTML document, it gets injected into the user's page.

The entire concept is not fully worked out yet - what I would like is to have a methodology to let someone try a tool, evaluate and demonstrate the use and operation of a software or library without asking the user to install it.

1
Entering edit mode

Hmm.. When will a biostar user want to call charms? What "concepts" are you thinking to "demonstrate, teach and show" through the web? If it is just for demonstration, it would be easier to show that on my own server (e.g. via ipython) as I need it anyway to talk to the biostar server? And if the purpose is for teaching, we are not really talking about huge data sets. Javascript would be a better fit than adding an extra layer interfacing to other languages over the internet? I guess I need to see concrete use cases to understand what charms is for. Anyway, good that you are thinking about this.

0
Entering edit mode

The rationale comes from teaching more than one bioinformatics course a year and it always gets me just unexpectedly difficult is to perform some of the simplest task unless we first spend several sessions making sure everyone's system is set up just right, installed the right tools libraries etc. half of the course is about unix, and frankly some people don't get unix at all - and I think that should be fine as well. One should be able to analyze data without understanding unix.

Example: Say I knew the name of two genes and I want to find out which one is longer. I don't know how to do this in a simple way. I can only do it by making use of a few decades of unix and other programming where chain together a series of pipe delimited commands that in turn will rely on a series of tools and libraries that I have preinstalled (some after quite the battle) on my system. or perhaps one can get there clicking about on webpages.

But what if there were a charm where getting sequence XYZ works in the browser by writing fetch(XYZ) and I could get the length of it by writing length(fetch(XYZ)).

And then if I want to align the two I could just say align(s1, s2) and performs the alignment right there.

It is true that writing it in straight javascript would be the best solution - it is more a matter of time and resources to put towards that - hence I want to add the option of reuse existing libraries in a simple way.

1
Entering edit mode

I see. That makes sense. What complex functionalities you need for teaching but are not easy to reimplement in javascript?

1
Entering edit mode

In the end it not just about the implementation but the usability of it. For example http://keithwhor.github.io/NtSeq/ provides an aligner but it in the current form it takes a lot of work to understand the results.

The library produces a series of objects that have all kinds of attributes and methods. What it does not seem to provide a simple way to produce a biologically meaningful interpretable results. Like say a blast output, or a tabular blast output or a SAM output.

1
Entering edit mode

I am not sure NtSeq does standard alignment. It seems to be using pattern matching. I have just written a real DP-based algorithm: https://github.com/lh3/bioseq-js The APIs are C-like (actually most of my javascript programs look like C because I don't know much about javascript). You don't need deep knowledge in javascript to understand what it is doing.

0
Entering edit mode

Whoa, this is so cool, it is exactly how I think about it. In fact this will be the way I will teach alignments from now on.

I have added your library to the charms - with an example usage (you may need to force refresh the page to get the newest javascript).

As an example I have created both a simplified aligner and formatter on top of the library that may be useful for some people but not others. The goal is to make it easy for others to customize and share the way that they think about making use of these libraries.

I will be working on a feature where a user can upload a javascript file into the charms and other users will be able to simply click it and specify that, when they visit the charms they want that user's file to be activated as well. Now this new library may override the original formatting or it may choose to add a new formatting option with a different name.

0
Entering edit mode

Maybe I misunderstand Charms, but I thought they are meant to demonstrate installable packages, not replace them. I see the short explanations "Charms are small programs written in a javascript based language" and "It can also perform calls to external programs" and took this to mean that Charms are Javascript wrappers of bioinformatics packages written in a variety of languages.

0
Entering edit mode

yes correct, these are either javascript programs or "javascript wrappers" I will use both of these terms from now on.

0
Entering edit mode

I've used wrappers between a few languages and have never gotten a definitive answer on the ambiguity of the term "X wrapper." It probably means X run under some other language Y, not Y running inside of X. So I got the term "Javascript wrapper" backwards; it should probably be "Perl wrapper."

0
Entering edit mode

Interesting ambiguity, I always took the wrapper to refer to the language that gets executed first: python wrapper over a c library, then the end of the sentence "C library" sometimes gets left out.

0
Entering edit mode

Good point.

0
Entering edit mode

Making the FAST utilities available as Charms would be super. Thanks!

0
Entering edit mode

+1 This looks neat indeed. Note that there is the bputils project that is part of BioPerl now, which appears to have a lot of overlap with this project. Both are command-line utilities relying on BioPerl so you might want to coordinate to reduce efforts.

0
Entering edit mode

Thanks. I, personally, am not familiar with bputils but will ask the team about it.

0
Entering edit mode

Today some changes to the FAST utilities fasuniq, fascodon, and alnpi were released, as well as changes to the paper, including new comparisons to similar packages.

4
Entering edit mode
7.7 years ago
Kamil ★ 2.1k

You might be interested to look at other similar projects: FASTA and FASTQ tools

0
Entering edit mode

Thanks, I will read up on these.