Entering edit mode
2.1 years ago
Natalia
•
0
Hi all,
I am hoping to download a large set of genomes from NCBI to build a custom database for Kraken2, in order to bin sequences in my eDNA sample. I am estimating that my custom database will be too large to store locally (>60,000 genomes and 1 TB of storage). Is it advisable to upload this to a cloud service, e.g. AWS? Or is this not so easy to work with if I am re-building a custom database on a regular basis (~ every month)?
If so, is this an affordable option?
That may depend completely on your budget. If you don't have local infrastructure that can handle the requirements then you don't have other option but to use cloud. Be aware of ingress/egress charges which can add up. Besides the storage you will likely need VM with significant memory and resources to create the indexes. Not sure what you are looking to do exactly but using NCBI prokaryotic representative genomes may be the way to go here to cut down on resource requirements and thus overall costs.
Thanks for your reply! I am building a customized, curated database of organisms to analyse marine eDNA. I want to only include aquatic organisms in this (both eukaryotic and prokaryotic), and also I want to retain the ability to update this database with MAGs which I am hoping to assemble.
As far as I understand, using the entire NCBI genomic databases for Kraken2 analysis is highly computationally intensive. For my project, I need quick turnaround times for the analysis, which is why I have turned to creating a database from scratch. Am I correct in my assumptions (if you have any experience in this type of analysis)?