Question: How to extract specific columns from Drugbank xml file
0
gravatar for vasilislenis
19 months ago by
vasilislenis110
United Kingdom
vasilislenis110 wrote:

Hello everyone,

I would like to generate a tab-separated file from DrugBank that will include the following tags:

<drugbank-id> <name> <gene-name> <action>

I have tried to use the xmlstarlet tool by following Lyco's instructions from here:

How To Convert Xml Into A Decent Parseable Format?

but I don't have any result. xmlstarlet doesn't return anything as result (I believe that the xml structure is a little more complicated than his example and I'm not getting any kind of error). I have also tried to change the namespace that drugBank uses but nothing changed.

I have also tried to use the csv files from DrugBank external links which is fine for the name, the id of the drugs and the protein name but they don't include the "action" information.

So, any help would be greatly appreciated...

Thank you very much in advance, Vasilis.

drugbank xml • 1.2k views
ADD COMMENTlink modified 10 months ago by mohfcis20 • written 19 months ago by vasilislenis110
3
gravatar for Pierre Lindenbaum
19 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum128k wrote:

Samuel Lampa wrote something in march

I wrote my version using a streaming xslt tool

ADD COMMENTlink written 19 months ago by Pierre Lindenbaum128k

Many thanks, Pierre for your help! I found Samuel's approach a little bit more complicated since you have to install GO language, so followed your approach by tweaking a little bit the xslt template from your example. I would really appreciate it if you could take a look at it and tell me your thoughts cause I am not so familiar with XML.

Thank you very much in advance, Vasilis.


<xsl:stylesheet xmlns:d="&lt;a href=" http:="" www.drugbank.ca"="" rel="nofollow">http://www.drugbank.ca" xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'>
<xsl:output method="text"/>

<xsl:template match="d:drugbank">
<xsl:apply-templates select="d:drug"/>
</xsl:template>

<xsl:template match="d:drug">
<xsl:value-of select="d:name/text()"/>
<xsl:text>      </xsl:text>

<xsl:for-each select="d:targets/d:target/d:polypeptide/d:gene-name">
         <xsl:value-of select=" concat(./text(),',')"/>
</xsl:for-each>
<xsl:text>      </xsl:text>
<xsl:for-each select="d:targets/d:target/d:actions/d:action">
        <xsl:value-of select=" concat(./text(),',')"/>
</xsl:for-each>
<xsl:text>
</xsl:text>
</xsl:template>

</xsl:stylesheet>
ADD REPLYlink modified 19 months ago • written 19 months ago by vasilislenis110
0
gravatar for mohfcis
10 months ago by
mohfcis20
mohfcis20 wrote:

Hi, You can use dbparser package https://github.com/Dainanahan/dbparser, it is designed to parse DrugBank database and return R dataframes

ADD COMMENTlink written 10 months ago by mohfcis20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1402 users visited in the last hour