Ubuntu 20.04.1 for genomic data analysis
3
0
Entering edit mode
3.3 years ago

Hi, Recently I bought a new laptop (8GB RAM-512 SSD) which I would like to dedicate it to my NGS data analysis workflows. Would you please let me know if I want to run my tools and pipelines on Ubuntu 20.04.1, how much disk space I need for working with following scenarios? 1. Processing the FASTQ and BAM files for 20 human genomic data, derived from targeted sequencing panel with 300X depth of coverage (let’s say 25 genes). 2. Processing the FASTQ and BAM files for 5 human genomic data, derived from WES with 120X depth of coverage. 3. Processing the FASTQ and BAM files for 1 human genomic data, derived from WGS with 30X depth of coverage. 4. Processing the FASTQ and BAM files for 1 human genomic data, derived from PacBio Sequel II WGS. May I use my windows hard drive as the output folder for data storage while working with Ubuntu (as I will make a dual boot in my laptop)? Sorry if my questions seems as novice but I’m looking for disk and partitions configuration while installing Ubuntu (i.e. general disk space for OS, home directory, and swap). However, I received some comments on no need for assigning swap in Ubuntu 20.04.1!

Thanks

next-gen sequencing genome • 2.4k views
ADD COMMENT
0
Entering edit mode

Hello Alireza.Tafazoli!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?p=237149#post237149

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
1
Entering edit mode
3.3 years ago
ATpoint 82k

512GB will fill up quickly. Harddrives are very cost-effective these days, I see little reason to go below 1TB. Keep in mind that you can deposit the raw data on an external drive after the initial alignment so their size is only partially relevant. With 8GB you are not going to do much though. WGS alignment is also nothing one usually does on a laptop. It is some time back that I processed WGS but if memory serves a 30x sample took about 8-10h on a decent server node including the sorting of the alignments (16-32 threads, with lots of RAM). Going to take ages on a laptop, this is simply not what laptops are built for, especially the heat production over such long runtime jobs will slow things down massively. Laptops are for portability and/or less intensive data analysis, not long and heavy CPU work. It essentially does not matter which OS you have if you have access to a Unix partition, so both macOS or any Linux distribution is fine.

Edit: For the heavy work you may consider cloud-based solutions such as AWS.

There are plenty of threads on this topic:

Which laptop will be good for most common and advanced bioinformatics works?

Laptop specs for Bioinformatic analyses

Laptop Suggestion to handle Bioinformatics Softwares?

Is Dell Precision mobile workstation suitable for bioinformatics/NGS analysis

Help a graduate student going into Bioinformatics looking for a new personal laptop. Should I get a Mac or a PC?

ADD COMMENT
0
Entering edit mode

Thank you very much for your comprehensive answer.

ADD REPLY
1
Entering edit mode
3.3 years ago
Mensur Dlakic ★ 27k

I don't think anyone can answer your questions with certainty, but to me you seem to be on the short side in everything that matters: memory and hard disk space. If you were working on bacteria maybe you could get by with this configuration, but I don't see how you can with a dual boot which will further limit your effective disk space. Despite that, not assigning any disk space to swap will even further hamper your ability, because 8 Gb of RAM is substandard for for almost any kind of genomic analysis.

At the very least, I would be looking to buy an external hard drive (minimum 1 Tb, 2-3 Tb would be even better) to use as storage, and move stuff back and forth as needed.

ADD COMMENT
0
Entering edit mode

Thank you very much for your points. Definitely go for an external hard drive.

ADD REPLY
0
Entering edit mode
3.3 years ago
tothepoint ▴ 800

As suggested by respected senior researchers do consider upgrading RAM and external hard drive. One point I would like to add as recently I installed Ubuntu 20.04.1 as second OS in my laptop with 32gb RAM and 512gb ssd. I have faced so many issues in installation of applications and working with data. I finally ended up with 18.04 and now everything seems fine for me.

ADD COMMENT
1
Entering edit mode

The experience I have is with desktops and with much more memory (32-256 Gb), so it may be different than yours. Still, I have had two computers with Ubuntu 18 for three years, and three with Ubuntu 20 for about a year. There is no material difference between them. I wouldn't be even mentioning this were it not for the fact that you are recommending the installation of a system version (18.04) for which the hardware enablement stack expires on April 2023. For version 20.04 it goes until April 2025.

ADD REPLY
0
Entering edit mode

I totally agree with your views. On Desktop Ubuntu 20.04 worked fine but same OS disappointed in dual boot and hence I shared my views.

ADD REPLY
0
Entering edit mode

Thanks. Actually I work with gene panels and/or WES data most of the time. For WGS, now I realized its not possible to work with simple laptops. However, experts comments are helpful to learn how to deal with any kind of data.

ADD REPLY

Login before adding your answer.

Traffic: 1232 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6