Split a blastp xml output
1
0
Entering edit mode
8.4 years ago

Hello!

I have an xml output of the blast, which would be used to analyze the Blast2go, however, as early analysis, always comes up the following error message:

Exception in thread "main" java.lang.OutOfMemoryError: Requested array size exceeds VM limit

I was suggested to divide the xml file into smaller files, like I could do this?

xml blast • 2.7k views
ADD COMMENT
0
Entering edit mode

did you try to increase the JVM memory with -XmX ? http://stackoverflow.com/questions/14763079

ADD REPLY
0
Entering edit mode

Hi Pierre,

I used Xmx60000m, I believe that I can not increase more =(

ADD REPLY
0
Entering edit mode

Yess! o //

Thank you Pierre, I will test the tool.

ADD REPLY
0
Entering edit mode

Hi Pierre,

The tool works for separation, However, the files Appear to all be the same, everyone starts describing the hits for the same gene, this is possible? it is possible to fix?

ADD REPLY
0
Entering edit mode

look at your question; You never talked about 'gene'. The tool only split the XML.

ADD REPLY
1
Entering edit mode
8.4 years ago

I quickly wrote a tool to split a XML file: https://github.com/lindenb/jvarkit/wiki/Biostar165777

$  java -jar dist-1.139/biostar165777.jar -o out__SPLIT__.xml -T Hit -N 5 ~/blastn.xml

$ ls -la ~/blastn.xml out*.xml
-rw-rw-r-- 1 lindenb lindenb 422606 nov.  14 12:47 /home/lindenb/blastn.xml
-rw-rw-r-- 1 lindenb lindenb  86319 nov.  14 16:17 out001.xml
-rw-rw-r-- 1 lindenb lindenb  83570 nov.  14 16:17 out002.xml
-rw-rw-r-- 1 lindenb lindenb  85096 nov.  14 16:17 out003.xml
-rw-rw-r-- 1 lindenb lindenb  88297 nov.  14 16:17 out004.xml
-rw-rw-r-- 1 lindenb lindenb  87123 nov.  14 16:17 out005.xml

$  grep -cF "<Hit>" ~/blastn.xml out*.xml
/home/lindenb/blastn.xml:100
out001.xml:20
out002.xml:20
out003.xml:20
out004.xml:20
out005.xml:20
ADD COMMENT

Login before adding your answer.

Traffic: 2467 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6