Question: Galaxy Workflow Management System Customization - Avoiding Duplication Of Files
5
gravatar for toni
9.1 years ago by
toni2.1k
Lyon
toni2.1k wrote:

Hi all,

My team is trying to set up an instance of galaxy workflow management system that will launch jobs on our local cluster. We are involved in projects dealing with high-throughput sequencing. We then have to manage a LOT of large files (several Gb each).

When uploading files into galaxy, these files are automatically copied to a folder named "database/files" and they are also sequentially renamed (dataset1, dataset2, dataset_3 ...etc..). This name convention is independent of the fact that a file is an input/intermediary/output file.

Copying and renaming files this way is too much time consuming and makes us lose our file structure.

Is there a way to avoid this behavior and that galaxy just remember the file path instead of possessing his own copy ?

If someone here has experience with this tool, any help or useful link would be appreciated.

Cheers,

tony

ADD COMMENTlink modified 8.9 years ago by Alastair Kerr5.2k • written 9.1 years ago by toni2.1k
1

You should ask this question on one of the galaxy mailing lists available at: http://lists.bx.psu.edu/listinfo

ADD REPLYlink written 9.1 years ago by Pierre Lindenbaum124k

yes, right. I just wanted to have a try here. Thank you.

ADD REPLYlink written 9.1 years ago by toni2.1k
9
gravatar for Alastair Kerr
9.1 years ago by
Alastair Kerr5.2k
The University of Edinburgh, UK
Alastair Kerr5.2k wrote:

Yes, Details on the wiki under the heading 'Upload files from filesystem paths' Be sure to check No for the question 'copy data into galaxy'

ADD COMMENTlink written 9.1 years ago by Alastair Kerr5.2k

Thank you.. I have been through the wiki many times but was unable to find this page !

ADD REPLYlink written 9.1 years ago by toni2.1k

Exactly right. You also want to store your files in Data Libraries: http://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/Libraries. Users can share files and copy them into their individual histories for processing without duplicating the datasets on disk.

ADD REPLYlink modified 3 months ago by RamRS25k • written 9.1 years ago by Brad Chapman9.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 736 users visited in the last hour