Question: Copying BCL files from BaseSpace directly into tar.gz
0
gravatar for GBC_Zonatos
8 months ago by
Brazil
GBC_Zonatos0 wrote:

I generally get files from BaseSpace using the 'bs' shell utility, with the following command:

bs cp conf://~default/Run/{run_id} {output_dir}/{run_id}

This creates a folder named {run_id} inside my {output_dir}, which then I transform into a .tar.gz file using

tar -czf {output_dir}/{run_id}.tar.gz {output_dir}/{run_id}

And then upload to my cloud service (AWS/GCP) using their default Shell utilities.

I've been looking for ways to copy the BCL files directly, either already doing the gzip along the copying process, or even somehow copying it directly from BaseSpace to Google Cloud Storage/Amazon S3. Is there any way to do that, copying files directly, or at least gzipping them as I copy them from BaseSpace, so I won't have two copies of the same file in my machine at one point? I didn't want to use a large machine at AWS/GCP, with high storage, just for the transfer, and would prefer to be able to copy directly into the 'Storage Service', though copying the file into the machine already zipped would also be preferable to copying the raw folder and files and having to zip them, thus consuming more than twice the storage size.

Also, if anyone has experience with Cloud Services, is there any specific GCP service you'd recommend for this copying process? I've been thinking of using CloudRun, so we'd have an established/automatic pipeline with that.

bcl basespace • 267 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by GBC_Zonatos0
1

You could provision a small VM. Install basespace utils on it and then copy to your storage bucket directly after logging into that VM?

You could use other utils like: https://github.com/BFSSI-Bioinformatics-Lab/BaseMountRetrieve

ADD REPLYlink modified 8 months ago • written 8 months ago by genomax91k

I'm looking to automate the process, so we don't have to actually log into the VM and use a script for that, but rather have a machine that I can send a post to with the 'run_id' and have it do all of that process by itself. We have at least one run every week, so having this manual step automated would help a lot. (it's already automated today, but running in a VM that is active 24/7, and I'd like to make it serverless so we don't have to keep a VM up all the time)

ADD REPLYlink modified 8 months ago • written 8 months ago by GBC_Zonatos0

yeah in the past I have just used basemount to keep the BaseSpace location mounted to the server I am on, then you can implement your automatic methods however you want. https://basemount.basespace.illumina.com/

If you already have an AWS instance then maybe you can just mount it there? Not sure

ADD REPLYlink modified 8 months ago • written 8 months ago by steve2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1038 users visited in the last hour