Question: How to view gi_taxid_nucl.dmp that is 11.3 GB from NCBI
0
gravatar for peggyw
6 weeks ago by
peggyw0
peggyw0 wrote:

I am working on a paper to considering understand and theorize a better way to mine a taxonomy information particle as it relates to the Kraken software. At my current point in the process I downloaded a zip file from NCBI gi_taxid_nucl.dmp.gz has a good chuck of what I am looking for like names.dmp, merged.dmp, division.dmp but one file gi_taxid_nucl.dmp is 11.3 GB and notepad++ is not going to open it. What tool or DB would I load this into to view the contents of the information? Am I looking at SQL, Oracle, MySQL?

sequencing sequence • 125 views
ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by peggyw0

Google 'gi_taxid_nucl.dmp.gz' yields a 2013 article https://www.polarmicrobes.org/some-things-should-be-easy/ which suggests sqlite or grep. Are you trying to view this file on Windows? Is your primary interest just to see the contents, or to build a database from it and/or link it to other data tables?

ADD REPLYlink written 6 weeks ago by Ahill1.8k

sqlLite - interesting and not heard of grep. Windows is my primary tool, yes I know we are all suppose to love Linux, sorry just can't. Right now the primary interest is to not only see the file but convert the information including name.dmp and node.dmp into a Graph Database. I will look for sqlLite tool to install and see if it works.

I took a look at the link and I might be chasing the wrong information. If the gi_taxid_nucl1.dmp is just two values for each row GI ID and Taxonomy ID then the one key item I need is missing is the taxonomy sequence.

Here is where I am trying to get to. I been working a little with Kraken and though a very interesting software there is a major hurdle in trying to use it if you don't have a Super Computer. It comes with four needed DB files and I use the term DB Files lightly. database.idx, database.kdb, names.dmp, and nodes.dmp and in my case with some of the work I been supporting requires a Terabyte of RAM to run. What I real want to get to and I though the gi_taxid_nucl1.dmp would get me there is a Taxomony ID and k-mer database. Looks like I may have to back up in the Kraken code to figure out how a k-mer DB is created and convert the code to create a graphical db vs flat file db.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by peggyw0

On Windows, presuming the .dmp files are a delimited text format, you can view using the more command and the corresponding equivalent to 'grep' to search for text in the file is findstr. Those tools will not require you to load the full files into memory. They won't help you directly with your larger goal, but would allow you to view the file(s) to get started.

ADD REPLYlink written 6 weeks ago by Ahill1.8k

wrong place to post

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by peggyw0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1190 users visited in the last hour