Forum:Best way to manage lots of omics data
0
2
Entering edit mode
6.6 years ago

Hi,

As sequencing becomes more and more common, the number of samples is increasing dramatically. In my daily job we handle hunderds of RNA-Seq, DNA-Seq of different kind (WGS, targeted, etc...), and more "custom" libraries. We use a HPC (slurm based) to ensure the analysis of this data.

For now we use classic samplesheet in txt format to handle these samples. These sample sheets contains the path to fastq files, sample name and additional information (species, date of the run, etc...) depending of the type of library. But as the number of samples become huge these sample sheets became also huge and complicated to maintain (multiple user have access to these sample sheets and human-related errors can be easily introduced). Also the fact that absolute fastq files path are stored in these sheets are maybe not the best idea as changes in the file/directory structure may arise...

So is there better alternative to handle large number of samples from different origin ? A kind of database maybe ? How to handle the path to these samples. The final goal is to use this "new" structure/database in conjection with analysis workflows.

Also is it possible to extend this "new" structure to integrate reference genomes sequences, their annotation and associated aligner indexes (bwa, STAR, etc...)

Thanks

database NGS • 1.2k views
ADD COMMENT
2
Entering edit mode

This is a the age old problem of organizing experimental data or finding an appropriate LIMS in short. With passage of time the answer remains the same. Anything that has the word "enterprise" in its name will be feature rich but beyond means of an individual lab whereas free software options would not be immediately usable since they were designed to fit the needs of whoever made them in the first place. Both will likely "suck" (to a large extent) in practical terms since I am yet to see a LIMS solution that will satisfy ALL users from a local user group.

There are examples of folks extending Galaxy interface to manage NGS data. Sierra is a solution that works for Babraham and is freely available.

If you have the means (they don't need to be great) of writing something yourself (strictly following a minimalist approach to ensure "sprawl", that comes from multiple people saying it would nice if it did this too, remains under control) may ultimately be the best solution. It will likely address 90% of your needs and the rest 10% will become eas(ier) to mange by manual means.

ADD REPLY

Login before adding your answer.

Traffic: 2567 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6