Question

Biological Databases Geographical Distribution

0

Entering edit mode

8.2 years ago

arash.iranzadeh1980 ▴ 30

Hi,

I am interested to know how biological databases have been distributed worldwide? I think most of them are located in US and Europe. Am I right? Any other important country involved?

Biological databses geographical distribution • 2.6k views

ADD COMMENT • link updated 8.2 years ago by Pierre Lindenbaum 166k • written 8.2 years ago by arash.iranzadeh1980 ▴ 30

score 3 · Answer 1 · 2017-09-02

3

Entering edit mode

8.2 years ago

Pierre Lindenbaum 166k

If it helps, using my tools http://lindenb.github.io/jvarkit/XsltStream.html and http://lindenb.github.io/jvarkit/PubmedDump.html I've extracted the affiliation of the first author of NAR database issue 2015.

ADD COMMENT • link 8.2 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

@Pierre, I am using your xsltstream to parse a xml i have downloaded using ncbi eutils. I have modified the above xls file(biostar270498.xsl) to get my desired output (Title and Abstract text). It works fine but after certain entires(~500) it throws a error. Can you please see if i have done anything wrong in the xsl or while using your tool.

Usage: cat ~/Downloads/test_e_renal_kidney.xml | java -jar dist/xsltstream.jar -t ~/Downloads/test_e_kid_renal.xsl -n PubmedArticle

xsl file:

   
<xsl:stylesheet xmlns:xsl="&lt;a href=" <a="" href="http://www.w3.org/1999/XSL/Transform" rel="nofollow">http://www.w3.org/1999/XSL/Transform" "="" rel="nofollow">http://www.w3.org/1999/XSL/Transform' version='1.0' >   
<xsl:output method="text" encoding="UTF-8"/>   
<xsl:output method="text"/>  
<xsl:template match="/">  
<xsl:apply-templates select="PubmedArticle"/>  
</xsl:template>  
<xsl:template match="PubmedArticle">  
<xsl:apply-templates select="MedlineCitation/Article/Abstract/AbstractText"/>  
<xsl:text>  
</xsl:text>  
</xsl:template>  
</xsl:stylesheet>

The error i am getting after ~500 so output:

[SEVERE][XsltStream]ParseError at [row,col]:[98160,6]  
Message: The processing instruction target matching "[xX][mM][lL]" is not allowed.  
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[98160,6]  
Message: The processing instruction target matching "[xX][mM][lL]" is not allowed.  
    at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:596)  
    at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(XMLEventReaderImpl.java:83)  
    at com.github.lindenb.jvarkit.tools.misc.XsltStream.doWork(XsltStream.java:590)  
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:763)  
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:926)  
    at com.github.lindenb.jvarkit.tools.misc.XsltStream.main(XsltStream.java:627)  
[INFO][Launcher]xsltstream Exited with failure (-1)

PS i am still working to get abstract using the above xls

Thank you for your time and thank you for your tool.

ADD REPLY • link 4.9 years ago by luffy ▴ 130

0

Entering edit mode

what is the output of

xmllint --stream --noout ~/Downloads/test_e_renal_kidney.xml

?

ADD REPLY • link 4.9 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

@Pierre Thank you for your response,

Output is:

/home/dell/Downloads/test_e_renal_kidney.xml:98160: parser error : XML declaration allowed only at the start of the document  
<?xml version="1.0" ?>  
     ^  
/home/dell/Downloads/test_e_renal_kidney.xml : failed to parse

Now i see what is the problem, its in the xml file, it has version line multiple times.

grep -c '?xml version=' ~/Downloads/test_e_renal_kidney.xml 1426

Any way i can parse this xml file to get title and abstract becase it very large file(7.9gb)

Thank you for your time

ADD REPLY • link 4.9 years ago by luffy ▴ 130