Medical genetics server for a developing country
2
1
Entering edit mode
6 months ago

Dear community members,

I've recently was involved into a project regarding starting medical genetics facility in one of the developing countries. Have to admit - I am more of a "classical" bioinformatician in a Western country and I've never experienced shortage of computational power nor have I ever set up a server myself.

What would you aim at, in terms of CPU/HDD (solid or magnetic?)/RAM? The volume: currently 100 WES, 1-2 exomes per week. No sequencing will be performed on site, so no demultiplexing needed. However, alignment is already a must. Short variant calling as well as CNVs + storage of databases and annotation is obviously the most typical tasks for this future server.

Thank you for any hints.

server • 685 views
ADD COMMENT
2
Entering edit mode

Some unrelated advice.

To be safe apply the same security/privacy considerations your are used to using from the beginning for everything. This will save you heartache down the road.

If possible find a proper systems admin (whose day job is such) to manage the server/storage/security.

ADD REPLY
0
Entering edit mode

Yes, it seems to be a very hard job and I am not sure if there is a single systems admin in this country who would agree to participate in the projects with the funds we [actually, the local team, I am an external advisor] can offer to them. Thanks a lot for this advice - I have totally forgotten about the security issues.

ADD REPLY
1
Entering edit mode

If you can get the local team to split and store any identifying information somewhere other than the server holding the sequence data that will add a layer of protection.

Kudos for volunteering your time and expertise.

ADD REPLY
2
Entering edit mode

I'm not an expert on server deployments but, whatever configuration you settle on, make sure you give a lot of thought and attention to your data storage. Things like CPU and RAM can be swapped about or replaced, but if you lose data, you are toast. It might not be unreasonable to consider the data storage and backup solution to be of greater importance than the compute capacity, to some extent. The last thing you need is a flood or hurricane or fire to destroy all your storage volumes and lose all your raw data. Also as mentioned, evaluate cloud resources like AWS for feasibility.

ADD REPLY
3
Entering edit mode
6 months ago

When setting up servers look at the ansible ecosystem. If you set up 1 server a second is typically not far behind, so use templating systems like ansible.

The Galaxy Admins on github have lots of example ansible scripts for setting up your own servers. Start simply, copy and paste etc. It's really worth it, especially if you have to expand into the cloud later.

https://github.com/ARTbio/GalaxyKickStart

I'd start with nf-core pipelines for analysis.

The most important thing might be trying to get a reliable power supply and cooling though.

ADD COMMENT
2
Entering edit mode
6 months ago
Darked89 4.6k

You can pretty much always (re)analyze data getting CPU time on some cluster/AWS etc. But the crucial thing is to have a solid data storage and a backup so you have the raw(ish) data intact. So you may start with a RAID/ZFS storage and a UPS. Check your connection speed and see if i.e AWS Glacier is a viable option for a long term data storage.

Lower end Epyc boxes (single 24cores, 64G RAM) should be good enough to start.

Just even with a single NGS data processing server go for Slurm and Nextflow/Snakemake from the month one. This will save you a lot of CPU and your brain time.

ADD COMMENT

Login before adding your answer.

Traffic: 1790 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6