Question

Galaxy Workflow Management System Customization - Avoiding Duplication Of Files

5

Entering edit mode

13.7 years ago

toni ★ 2.2k

Hi all,

My team is trying to set up an instance of galaxy workflow management system that will launch jobs on our local cluster. We are involved in projects dealing with high-throughput sequencing. We then have to manage a LOT of large files (several Gb each).

When uploading files into galaxy, these files are automatically copied to a folder named "database/files" and they are also sequentially renamed (dataset_1, dataset_2, dataset_3 ...etc..). This name convention is independent of the fact that a file is an input/intermediary/output file.

Copying and renaming files this way is too much time consuming and makes us lose our file structure.

Is there a way to avoid this behavior and that galaxy just remember the file path instead of possessing his own copy ?

If someone here has experience with this tool, any help or useful link would be appreciated.

Cheers,
tony

galaxy next-gen-sequencing • 4.0k views

ADD COMMENT • link updated 16 months ago by Ram 44k • written 13.7 years ago by toni ★ 2.2k

1

Entering edit mode

You should ask this question on one of the galaxy mailing lists available at: http://lists.bx.psu.edu/listinfo

ADD REPLY • link 13.7 years ago by Pierre Lindenbaum 163k

0

Entering edit mode

yes, right. I just wanted to have a try here. Thank you.

ADD REPLY • link 13.7 years ago by toni ★ 2.2k

Ram · Answer 1 · 2010-11-03

9

Entering edit mode

13.7 years ago

Alastair Kerr 5.3k

Yes, Details on the wiki under the heading 'Upload files from filesystem paths' Be sure to check No for the question 'copy data into galaxy'

ADD COMMENT • link 13.7 years ago by Alastair Kerr 5.3k

0

Entering edit mode

Thank you.. I have been through the wiki many times but was unable to find this page !

ADD REPLY • link 13.7 years ago by toni ★ 2.2k

0

Entering edit mode

Exactly right. You also want to store your files in Data Libraries: http://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/Libraries. Users can share files and copy them into their individual histories for processing without duplicating the datasets on disk.

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 13.7 years ago by Brad Chapman 9.7k