Question

Tool:Variant annotation and filtration server ALAPY Genome Explorer (AGx)

7

Entering edit mode

6.9 years ago

Petr Ponomarenko ★ 2.8k

ALAPY Genome Explorer (AGx) server

Standalone server for VCF files annotation with RefSeq, dbSNP, ExAC, ClinVar, 1000 genomes, HGNC, ESP6500, predicted consequences, SIFT, PolyPhen and more.

Supports modern browsers on computers, tablets, and smartphones.

Current GUI supports limited analysis functionality that now includes flexible filters and inheritance models with tumor/normal and trio analysis and two column sorting.

GUI supports history and versioning of your filters, models, analyses and views.

You can exports results into sql, txt and csv formats of selected variants.

It is free for academic use. Please read EULA here.

Installation

We provide installation scripts for Linux systems here and installation wiki here.

Running

We provide scripts needed to start the server and add users on Linux systems here.

Usage

Please refer to our FAQ and tutorials on ALAPY YouTube channel.

You can also use AGx as a cloud service provided by ALAPY. It is also free for education. Please register here to get access.

enter image description here

Please tell us what you think and how we can make it better.

Thank you,
Petr

vcf variant-annotation variant-interpretation • 3.5k views

ADD COMMENT • link updated 10 months ago by Ram 43k • written 6.9 years ago by Petr Ponomarenko ★ 2.8k

score 3 · Answer 1 · 2017-05-23

3

Entering edit mode

6.9 years ago

Petr Ponomarenko ★ 2.8k

AGx server was tested for Ubuntu 12 and 14 LTS. It relies on apt-get. I was able to install it on RedHat 4.8.5 with yum. Could you please write if you were able or not able to install it on your system, so we can help or write it down as in testing and update scripts accordingly? Thank you, Petr

ADD COMMENT • link 6.9 years ago by Petr Ponomarenko ★ 2.8k

1

Entering edit mode

This will be an interesting tool. However, all the clusters I have access to disallow users to run docker, let alone the root permission. If you are requiring apt-get and using docker, you have excluded several thousand potential users working on these clusters. If you want your tool to be widely used, make it installable on a CentOS6 virtual machine without root.

ADD REPLY • link 6.9 years ago by lh3 33k

0

Entering edit mode

Hi, Ih3. Thank you for your comment! We can create separate version for CentOS6 and make it independent of Docker. How much is it needed? AGx is a complex software with different components like Redis, RabbitMQ, and Postgres. It is hard to install and set these up with no root access. You say thousands of potential users work on clusters. We think such big clusters will be needed only for big research projects or commercial labs and they will have root privileges, while individual users and small labs can run this system on their local server or even on their personal laptops. We tested AGx on Windows, Linux and MacOS laptops using virtual machines. For example on Sony Vaio PRO13 (Core i5 8Gb RAM 256Gb SSD) with a virtual machine with 50GB SSD 4GB RAM and 2 cores, we installed Ubuntu 16 and AGx. AGx takes about 21GB of hard drive space after installation is complete and temporary files are removed. AGx works a bit slower on such configuration, compared to a server like 4 core Intel Xeon E5-2670 v2 30GB RAM. So instead of 4 minutes to upload and prepare a WES vcf trio for analysis, you will need 8 minutes. This process happens in the background and multiple samples can be uploaded at the same time. This background process does not affect the time of the analysis, sorting, filtration and other processes. Analysis of the samples using different databases like ExAC, 1000 genomes, filters, inheritance models and multiple column sorting, happens in 1-2 seconds for a WES vcf trio even on a laptop with a virtual machine. It is a bit slower than on a big server, but we think it is already ok at least for research and education. Obviously, if 1000 users will try to analyze 1000 samples at the same time they will get slower speeds and simultaneous upload of that many files will take much more time. But a group of 3-5 people can work 24/7 on such a laptop simultaneously and see high speeds for upload and analysis.

ADD REPLY • link 6.9 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

Most biologists don't run VM, let alone install a Linux in it. Your tool should simplify data processing, not complicate it. I don't know what industrial users will think about the tool, but I am moderately certain that few academia users will use it.

The dependencies you chose are too heavy. I don't know RabbitMQ. For the rest, I would use sqlite and implement the server in Go. Redis is lightweight enough. The end product would be a single executable plus possibly a few Redis dynamic libraries. Most users would prefer this over yours.

ADD REPLY • link 6.9 years ago by lh3 33k

0

Entering edit mode

Hi Ih3. Thank you for valuable discussion. We were working on making ALAPY Compressor faster last week, so here are me and my team thoughts on important questions you raised:

We totally agree with you that software has to make things easier. To use ALAPY Genome Explorer one needs basic experience of working with modern computers and internet browsers. Installation of the software at the moment is for more advanced users, who are familiar with Linux and have some experience in using Terminal. One does not have to be a Linux guru to install it on apt (Debian-based) systems. Installation on yum (RPM) systems is a bit more complicated at the moment and not well tested yet. We might distribute version particularly for CentOS later. Most users can opt to use our own cloud servers, which are free for academia and education, require no advanced knowledge or experience, yet as powerful for complicated annotations, variant filtering and inheritance models as a standalone version.

On most Debian based systems user need to run three scripts to install, start and manage the server. It was tested extensively on Ubuntu 14 and 16. Detailed instructions are located here https://github.com/ALAPY/AGx/wiki/Installation-of-ALAPY-Genome-Explorer-(AGx).

Our choice of dependancies is not random. We make different software and tested many architectures before selecting the current stack. The idea behind our choices is to make a system that is stable, fast, easy to manage and scalable. As you pointed out there is going to be a need for systems capable of very fast processing, comparison and analysis of millions of very diverse vcf files C: NGS files' shrinkage software: ALAPY Compressor, only fastq files so far =). We agree with you and work on this.

I will try to explain why we select particular dependancies:

To store data about mutations we use NoSQL DB. We have to use it, because data in VCF files inherently is not flat and can be very different in structure between different mutations within the same vcf in some cases and most of the time when samples are from multiple sources/pipelines/NGS machines/labs. We worked on 50+ software products and projects and have experience with many SQL and NoSQL databases. Our DB allows to work with samples at least 100 times faster than classical relational DBs.

We use PostgreSQL only for metadata and other internal system's information. Redis and RabbitMQ allow horizontal scalability. That way our solution can provide access to thousands of users simultaneously to query millions of samples. This can already be important for some hospitals, big laboratories and research centers. We think it is going to be more needed in the coming years.

ADD REPLY • link 6.9 years ago by Petr Ponomarenko ★ 2.8k

1

Entering edit mode

I don't have a problem with docker usage and appreciate all the work which has gone into packaging this tool up nicely. I will give installation a go when I get a chance.

For those struggling with docker, other container software does exist. Check out singularity for example, though I think it is early days yet. http://singularity.lbl.gov/

Many well resourced environments will also have a few decent workstations independent of a cluster which docker can be installed on.

ADD REPLY • link 6.6 years ago by colindaven 6.4k