Question: Galaxy Workflow Management System Customization - Avoiding Duplication Of Files
9.9 years ago
toni wrote:

Hi all,

My team is trying to set up an instance of galaxy workflow management system that will launch jobs on our local cluster. We are involved in projects dealing with high-throughput sequencing. We then have to manage a LOT of large files (several Gb each).

When uploading files into galaxy, these files are automatically copied to a folder named "database/files" and they are also sequentially renamed (dataset1, dataset2, dataset_3 ...etc..). This name convention is independent of the fact that a file is an input/intermediary/output file.

Copying and renaming files this way is too much time consuming and makes us lose our file structure.

Is there a way to avoid this behavior and that galaxy just remember the file path instead of possessing his own copy ?

If someone here has experience with this tool, any help or useful link would be appreciated.



written 9.9 years ago by toni

You should ask this question on one of the galaxy mailing lists available at:

written 9.9 years ago by Pierre Lindenbaum

yes, right. I just wanted to have a try here. Thank you.

written 9.9 years ago by toni
9.9 years ago
Alastair Kerr
Manchester/UK/Cancer Biomarker Centre at CRUK-MI
Alastair Kerr wrote:

Yes, Details on the wiki under the heading 'Upload files from filesystem paths' Be sure to check No for the question 'copy data into galaxy'

written 9.9 years ago by Alastair Kerr

Thank you.. I have been through the wiki many times but was unable to find this page !

written 9.9 years ago by toni

Exactly right. You also want to store your files in Data Libraries: Users can share files and copy them into their individual histories for processing without duplicating the datasets on disk.

written 9.9 years ago by Brad Chapman
Please log in to add an answer.


