Hi folks, looking for some feedback/advice here;
Basics of the situation:
I have a large (140mb, 49,000 contigs) annotated (140,000 features) metagenome that I'd like to upload to NCBI. I ran it through prokka, then went in and added a handful of custom annotations to the *.tbl file. Consequently, the contig-fasta and tbl file need to be run through the tbl2asn gauntlet again. I'm using the following code for tbl2asn (latest version downloaded from NCBI FTP):
tbl2asn \
-i mgenome.contigs.fna \
-f annotations.tbl \
-t template.sbt -o mgenome.asn -V vbt \
-s T -m B -l paired-ends -a r10k -W T \
-y 'Annotated using prokka 1.13.3 from https://github.com/tseemann/prokka'
Here's the issue:
As a trial, I ran thorough a trial tbl2asn using only the first contig (which includes some of the custom annotations) and went through genbank submission to the final steps; the .asn file was successfully validated and everything looks good. However, when I apply tbl2asn on the entire assembly/annotation, it simply does not complete. CPU remains 100% utilized, RAM remains occupied, but the process does not complete and there is no informative stdout feedback or logfile generated.
I have actually completed such a process before, but with a much smaller metagenome assembly.
- Has anybody successfully run an assembly of this scale through tbl2asn succesfully?
- Any idea why this might be getting hung up? (I can't provide the files themselves for trials unfortunately)
Any feedback or advice will be much appreciated!
Update (sort of solved):
I ran the same command on a Macbook instead of Ubuntu laptop, both 16Gb RAM systems, just in the hope that the mac version of the program wouldn't have the same problem. There was no obvious different. I noticed that even though the RAM was not totally maxed out, it was using a good deal of virtual (swap) memory, which could explain the slowness. Therefore, I set this running on a more powerful desktop (64Gb RAM) and left it over the weekend. It finished up after ~3 days and passed NCBI asn validation checks. However, even though it had tons of RAM real-estate, it kept using a lot of swap memory and really didn't seem to process much faster.
The conclusion I suppose is that yes, it will complete, but be prepared to wait.