Ncbi Asn.1 Java Parser
2
1
Entering edit mode
11.0 years ago
Raygozak ★ 1.4k

Hi, has anyone manipulated asn.1 files downloaded from NCBI? i do understand that asn.1 is not ncbi specific and that there are many other formats i can download the data from NCBI. My question specifically relates to asn.1 and whether someone knows of a java library that you have used to parse these files. I have googled and many libraries speak of BER and DER encoding, which are communication specific formats. All i'm interested is in manipulating in java the ascii asn.1 files that one can download from ncbi.

Thanks

ncbi java parsing • 5.9k views
ADD COMMENT
1
Entering edit mode
11.0 years ago
wdiwdi ▴ 380

The ASCII form of the NCBI ASN.1 data does not follow any standard and is essentially an invention by NCBI (that's why they also provide some converters). Only the binary form can be processed with generic ASN.1 tools. BER is the standard low-level encoding used in binary ASN.1 and not, contrary to what you wrote, anything communication-specific. The standard approach to parse binary ASN.1 data is to get the encoding definition file for a specific downloadable item (they are provided by NCBI for all their ASN.1 data, which many definition parts shared between databases), generate a parser, and link that to your application.

I have done that for PubChem ASN.1 compound, substance, and assay data. I have been using the SNACC parser generator to generate C code for linking (warning: there are some data item sequences where SNACC generates wrong code, trying to read an extra token from the input stream. You need to postprocess the generated parser source to fix that). The assay and structure readers are components of the generic academic version of the Cactvs Cheminformatics Toolkit www.xemistry.com/academic) Also note that a parser is surprisingly large and complex do to extensive inclusion of definitions from other NCBI branches grown over decades. For example, the literature reference definition part included by the PubChem assay and structure data is much more extensive, with dozens of different ways to specify even the most exotic type of reference, than the actual structure and assay data part.

ADD COMMENT
0
Entering edit mode
11.0 years ago

what kind of file do you need to parse ? the NCBI provides some converters:

ftp://ftp.ncbi.nlm.nih.gov/asn1-converters/by_program/

all you need is to pipe the XML output of those programs into a (XML) java parser:

converter -i file.asn | java -jar doSomethingSAXorDOMorStaX.jar > result
ADD COMMENT
0
Entering edit mode

Yeah, I found these converters after posting the question, however i prefer to be able to have the library that models a given file format since i have more control over the ways i can manipulate it. It is true i can use the converters to get say xml, but it still seems inefficient to me. For the time being i guess i will do it this way.

ADD REPLY
1
Entering edit mode

I suppose you googled for java-based ASN1 compilers/code-generators like http://sourceforge.net/projects/jac-asn1/ . As far as I remember I played with the NCBI ASNs but the time required to explore the solutions was not worth trying.

ADD REPLY
0
Entering edit mode

Hmm strangely enough i didn't checked this one, thanks this is great.

ADD REPLY

Login before adding your answer.

Traffic: 1487 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6