Question: As an illumina customer, do you use basespace, alternative cloud solution or pipelines built in-house?
5
gravatar for mani824
2.8 years ago by
mani82470
United States
mani82470 wrote:

Hi

We have a hiseq 2000 and manage our own GATK pipeline, grid, storage. We are in the process of purchasing a NextSeq 500 and are worndering if basespace would be a feasable longterm data analysis/housing solution?

If you are an illumina customer, do you use base space? if not can you explain the reason?

ADDENDUM: We do whole exome, tumor/normal (small panels to whole exome), transcriptome, and other 1000-5000 genes custom panels. 

Thanks

Manfred 

basespace illumina • 2.8k views
ADD COMMENTlink modified 2.8 years ago by Cliff Beall440 • written 2.8 years ago by mani82470
4
gravatar for mikhail.shugay
2.8 years ago by
mikhail.shugay3.2k
Czech Republic, Brno, CEITEC
mikhail.shugay3.2k wrote:

Hello,

Well the answer to your first question depends on what analysis do you actually perform. If those are some in-house pipelines that have no commonly used alternative, then the best solution for large-scale projects is to get familiar with cloud solutions, such as AWS. Note that Basespace "locks" you within its data management system, so you can't easily incorporate custom data processing steps.

As for the second question, in my lab we are currently working in a rather small field that has few publicly available software tools, so we mostly run our in-house pipelines on our server infrastructure (yet we had an idea to submit our apps to Basespace). Occasionally we have to work with big chunks of data sent by our collaborators, so we have developed a pipeline to manage AWS instances for such tasks.

ADD COMMENTlink written 2.8 years ago by mikhail.shugay3.2k
2

+1 for AWS. I've been using it extensively for the past 6 months or so and it has sped up my work tremendously. Here are some tips for using AWS:

- Use spot instances whenever you can. I find that the r3.8xlarge and c3.8xlarge instances are actually not fully in use most of the time (EU-West), so I can easily get them for stretches of days for around 35cents an hour.

- Make a small sized EB volume or image with all your favorite tools installed (with all dependencies locally) so you can easily mount this drive to your instance when you start working. Create a export PATH bash script to sort out your PATH variable. I actually prefer this to making a custom AMI because it is more flexible.

- Use StarCluster (http://star.mit.edu/cluster/) if you need a HPC-like cluster (uses sun-grid engine). It is extremely easy to setup and start. You can set it up to use spot instances also.

ADD REPLYlink written 2.8 years ago by Damian Kao14k
2

Also +1 for AWS.

All instances has also local SSD drives (one of them is mounted to /mnt on Ubuntu instances; for example r3.8xlarge has 2x320 GB SSD on board); by using them you can reduce you bills (especially on spot instances) and increase throughput because local drives are installed inside the actual compute node, and are much faster then EBS.

We do it in this way:

1. Upload data to S3.

2. Start spot instance(s).

3. Download data from S3 on the instance local folder on ephemeral drive (like /mnt) using AWS CLI.

4. Process data.

5. Upload data back to S3 using AWS CLI.

Sometimes we automate all this steps by scripts executed using cloud-init, without accessing instance through SSH.

ADD REPLYlink written 2.8 years ago by bolotin.dmitriy20
1

Thanks Mikhail

I added the apps we run to my question above. Thanks for your feedback. Ours seem to be pretty common workflows that would likely use many of the tools available as basespace apps, however we are still cautious about being "locked in" to their data management/storage/LIMS etc. 

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by mani82470
3
gravatar for Cliff Beall
2.8 years ago by
Cliff Beall440
Ohio
Cliff Beall440 wrote:

I played around with the basespace applications a bit and was pretty unimpressed. From my experience it's not up to the task, though I'm not doing the same applications as you.

For one example, the 16S analysis. It is pretty basic and inaccurate but I thought might give a preliminary idea of our samples. However, it would only work with some limited number of samples at a time (I think 50). If you tried to specify more it just kind of silently failed and only did 50 out of how ever many you specified. 

Also Illumina has not responded to support requests.

ADD COMMENTlink written 2.8 years ago by Cliff Beall440
1

That's helpful to know Cliff. Did you end up using the results from basespace or were more conformable redoing it with your own pipelines anyway ?

The reason I ask is, I am trying to figure out if

1) there is problem with the inherent architecture/datamanagement/reporting of basespace or

2) is there a problem with insufficiency in the varety of apps to do what you want

If its the second, then I foresee many new apps being added in near future, but if its the first, then thats a bigger problem 

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by mani82470
1

I wish they allowed threading apps together for building workflows/pipelines 

ADD REPLYlink written 2.8 years ago by prateek.kr10

I could see the usefulness of that - but unlike seven bridges, they can't ensure that the output of one app will behave nicely going into the next app. Apps are built by 3rd party vendors here

ADD REPLYlink written 2.8 years ago by prateek.kr10
2
gravatar for geek_y
2.8 years ago by
geek_y8.1k
Barcelona/London
geek_y8.1k wrote:

If possible, always good to build pipelines on your own, so that you will have more control and are highly customizable.  

ADD COMMENTlink written 2.8 years ago by geek_y8.1k

Thanks for your comment

ADD REPLYlink written 2.8 years ago by mani82470
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 612 users visited in the last hour