Question: System Configuration For Plant Genome Data Analysis Using Illumina Reads
2
gravatar for vaibhavbarot
7.8 years ago by
vaibhavbarot30
vaibhavbarot30 wrote:

Hi,

I want to purchase workstation for analyse illumina reads of plant genome (Read mapping and SNP calling).What minimum hardware configuration required???

illumina hardware • 2.9k views
ADD COMMENTlink modified 4.9 years ago by Saeid Kadkhodaei90 • written 7.8 years ago by vaibhavbarot30
4
gravatar for Saeid Kadkhodaei
4.9 years ago by
IUT
Saeid Kadkhodaei90 wrote:

ADD COMMENTlink modified 11 months ago by RamRS30k • written 4.9 years ago by Saeid Kadkhodaei90
2
gravatar for Michael Dondrup
7.8 years ago by
Bergen, Norway
Michael Dondrup47k wrote:

Minimum hardware specs are hard to give without further information about what is acceptable running time. Most analyses would run on old or cheap harware, but would simply take terribly long. The most important spec is memory size. Your genome, as indexed by the read mapper should fit in memory, that's what can be said without further details of genome size and software. Hardware Suitable For Generic Nextgen Sequencing Processing? seems to still be valid, just double the RAM figures. Otherwise, get as many CPU cores and fastest IO as you can afford. You will need a lot of disk space as well.

Other things to consider:

  • Support, system admin
  • Backup
  • disk space
  • do you really need a single new server for this, e.g. Is cloud an option, can you use existing large servers for this?
ADD COMMENTlink modified 7.8 years ago • written 7.8 years ago by Michael Dondrup47k

Thanks for respond,

I want minimum hardware configuration for mapping and SNP calling. Can you suggest ideal Hardware configuration for plant genome data of illumina.I dont have any existing server but planning to buy.

ADD REPLYlink written 7.8 years ago by vaibhavbarot30
1

minimum or ideal? that makes a difference. Also, what software 'exactly' are you going to run? What is the size of the largest genome you are working with? How many users in parallel will use the machine? You should use a Linux based server, will you use it interactively, or via ssh/telnet?

To find the minimum/optimal RAM size: get hold of/borrow a high-memory server, e.g. on amazon cloud, run a typical large job of the alignment step and monitor the process and its memory usage (e.g. 8GB), double the maximum memory required by that process and buy a decent computer that has this amount of RAM installed (e.g. 16 GB)and can host at least double this amount (e.g. 32GB, better up to 128GB) for later upgrades.

ADD REPLYlink written 7.8 years ago by Michael Dondrup47k

Thanks Michael

I want generalize configuration for Plant genome illumina reads mapping and SNP calling.Can i do this using 4Quadcore, 16 GB RAM and 1TB storage computer.

ADD REPLYlink written 7.8 years ago by vaibhavbarot30

Depends exactly on what you want to do (read Michael's post above on finding minimal RAM size). The specs you have listed above are not nearly enough for what you want to do.

ADD REPLYlink written 7.8 years ago by Josh Herr5.7k

Well, RAM and CPU might be sufficient, but the system would very soon run out of storage for the read data. The memory might be just sufficient. From the BWA manual: "With bwtsw algorithm, 2.5GB memory is required for indexing the complete human genome sequences. For short reads, the ‘aln’ command uses ~2.3GB memory and the ‘sampe’ command uses ~3.5GB." The largest sequenced plant genome is barley (5.1Gb, if that scales linearly, 6GB should be enough), if it is one of the smaller plant genomes, then it might even work with less. All assuming the intended pipeline used BWA for read mapping.

ADD REPLYlink written 7.8 years ago by Michael Dondrup47k

Thanks Michael, Can i call SNPs using configuration stated in previous post.

ADD REPLYlink written 7.8 years ago by vaibhavbarot30

No warranty, but most likely the analysis would work, but you will have no space to store it. You need to buy additional storage very soon, because your 1TB disk can be full after a few runs, assuming 100GB per run. Even the smallest compute solution offered by illumina has 20TB of disk space. http://www.illumina.com/documents/products/datasheets/datasheet_illuminacompute.pdf

ADD REPLYlink written 7.8 years ago by Michael Dondrup47k

Thanks Among from above configuration I've mentioned, RAM & Processors (16GB & 4 Quadcore) are enough for SNP calling. Please suggest. I have to increase storage capacity.

ADD REPLYlink written 7.8 years ago by vaibhavbarot30

Average plant genome size right now is in the range of 6Gb to 8Gb. I might have a skewed view of RAM & CPU since the plant genomes I work with are in the range from 4Gb to 20Gb. If we knew what plant we were talking about and had an estimate of the genome size, we could give a little more information to you, vaibhavbarot.

ADD REPLYlink written 7.8 years ago by Josh Herr5.7k
0
gravatar for Josh Herr
7.8 years ago by
Josh Herr5.7k
University of Nebraska
Josh Herr5.7k wrote:

To echo what Michael posted: You'll need to be quite clear on the size of your genomes (you'll need something with a lot of RAM and storage memory for plant genomes) and what you want to do at your workstation (will you be doing transcriptome assembly or just SNP calling?). Once you have an exact idea of what you'll be doing and the time frame you need for your analysis, then you can plan for the specs of your workstation. Storage For Miseq In-House may be some help in addition to the one that Michael posted.

ADD COMMENTlink written 7.8 years ago by Josh Herr5.7k

I want minimum hardware configuration for mapping and SNP calling. Can you suggest ideal Hardware configuration for plant genome data of illumina.I dont have any existing server but planning to buy.

ADD REPLYlink written 7.8 years ago by vaibhavbarot30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 988 users visited in the last hour