Question: Ncbi Asn.1 Java Parser
1
gravatar for Raygozak
4.6 years ago by
Raygozak1.1k
State College, PA, Penn State
Raygozak1.1k wrote:

Hi, has anyone manipulated asn.1 files downloaded from NCBI? i do understand that asn.1 is not ncbi specific and that there are many other formats i can download the data from NCBI. My question specifically relates to asn.1 and whether someone knows of a java library that you have used to parse these files. I have googled and many libraries speak of BER and DER encoding, which are communication specific formats. All i'm interested is in manipulating in java the ascii asn.1 files that one can download from ncbi.

Thanks

ncbi java parsing • 3.1k views
ADD COMMENTlink modified 4.6 years ago by wdiwdi380 • written 4.6 years ago by Raygozak1.1k
1
gravatar for wdiwdi
4.6 years ago by
wdiwdi380
Germany
wdiwdi380 wrote:

The ASCII form of the NCBI ASN.1 data does not follow any standard and is essentially an invention by NCBI (that's why they also provide some converters). Only the binary form can be processed with generic ASN.1 tools. BER is the standard low-level encoding used in binary ASN.1 and not, contrary to what you wrote, anything communication-specific. The standard approach to parse binary ASN.1 data is to get the encoding definition file for a specific downloadable item (they are provided by NCBI for all their ASN.1 data, which many definition parts shared between databases), generate a parser, and link that to your application.

I have done that for PubChem ASN.1 compound, substance, and assay data. I have been using the SNACC parser generator to generate C code for linking (warning: there are some data item sequences where SNACC generates wrong code, trying to read an extra token from the input stream. You need to postprocess the generated parser source to fix that). The assay and structure readers are components of the generic academic version of the Cactvs Cheminformatics Toolkit www.xemistry.com/academic) Also note that a parser is surprisingly large and complex do to extensive inclusion of definitions from other NCBI branches grown over decades. For example, the literature reference definition part included by the PubChem assay and structure data is much more extensive, with dozens of different ways to specify even the most exotic type of reference, than the actual structure and assay data part.

ADD COMMENTlink written 4.6 years ago by wdiwdi380
0
gravatar for Pierre Lindenbaum
4.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum100k wrote:

what kind of file do you need to parse ? the NCBI provides some converters:

ftp://ftp.ncbi.nlm.nih.gov/asn1-converters/by_program/

all you need is to pipe the XML output of those programs into a (XML) java parser:

converter -i file.asn | java -jar doSomethingSAXorDOMorStaX.jar > result
ADD COMMENTlink written 4.6 years ago by Pierre Lindenbaum100k

Yeah, I found these converters after posting the question, however i prefer to be able to have the library that models a given file format since i have more control over the ways i can manipulate it. It is true i can use the converters to get say xml, but it still seems inefficient to me. For the time being i guess i will do it this way.

ADD REPLYlink written 4.6 years ago by Raygozak1.1k
1

I suppose you googled for java-based ASN1 compilers/code-generators like http://sourceforge.net/projects/jac-asn1/ . As far as I remember I played with the NCBI ASNs but the time required to explore the solutions was not worth trying.

ADD REPLYlink written 4.6 years ago by Pierre Lindenbaum100k

Hmm strangely enough i didn't checked this one, thanks this is great.

ADD REPLYlink written 4.6 years ago by Raygozak1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 891 users visited in the last hour