Question: GeMoMa AnnotationFinalizer: java.lang.NumberFormatException: For input string: "7180000819398"
0
gravatar for Rohith B S
13 months ago by
Rohith B S0
India
Rohith B S0 wrote:

Issue with GeMoMa AnnotationFinalizer

I was running GeMoMa to predict genes/proteins and annotate my plant genome assembly (repeat masked). But the last step where AnnotationFinalizer module in GeMoMa throws the following error:

Error:

starting AnnotationFinalizer
java.lang.NumberFormatException: For input string: "7180000819398"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:583)
        at java.lang.Integer.parseInt(Integer.java:615)
        at projects.gemoma.AnnotationFinalizer$SequenceIDComparator.extractInt(AnnotationFinalizer.java:410)
        at projects.gemoma.AnnotationFinalizer$SequenceIDComparator.compare(AnnotationFinalizer.java:400)
        at projects.gemoma.AnnotationFinalizer$SequenceIDComparator.compare(AnnotationFinalizer.java:1)
        at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
        at java.util.TimSort.sort(TimSort.java:234)
        at java.util.Arrays.sort(Arrays.java:1438)
        at projects.gemoma.AnnotationFinalizer.run(AnnotationFinalizer.java:488)
        at projects.gemoma.GeMoMaPipeline$JAnnotationFinalizer.doJob(GeMoMaPipeline.java:1466)
        at projects.gemoma.GeMoMaPipeline$FlaggedRunnable.run(GeMoMaPipeline.java:917)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Resolution Attempts

  1. I verified whether filtered_prediction.gff has an integer in the first column. but NO. I see jcf7180000819398.

  2. I tried other tools like gffread to convert gff to gtf, hence to use get_sequence_from_gtf.pl from GeneMark to get the sequence.

  3. Secondly, I tried getAnnoFasta.pl from Augustus (Partially works but annotations are nowhere available in the fasta, also no protein sequences).

  4. Thirdly, played around with rtracklayer(failed) and bedtools(gave fasta but again unreliable).

Please help with some leads.
Thank you in advance.

ADD COMMENTlink modified 13 months ago • written 13 months ago by Rohith B S0

Welcome to Biostars and thank you for the contribution! Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLYlink written 13 months ago by RamRS27k
1

I don't know this tool, but the problem is that it's trying to parse that value to a 32-bit integer, which has a maximum value of 2147483647.

ADD REPLYlink written 13 months ago by tpoterba50

Thank you. I will do the same from next time.

ADD REPLYlink written 13 months ago by Rohith B S0
0
gravatar for Rohith B S
13 months ago by
Rohith B S0
India
Rohith B S0 wrote:

I found out from the developers that, the tool tries to sort the input based on the numeric value in the scaffolds/contigs while doing that they were typecasting to integers. Hence the issue was caused. They mentioned that this will be fixed in the next release.

We need to use the tools version above 1.6.0.

tpoterba Thank you for your help.

ADD COMMENTlink written 13 months ago by Rohith B S0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1059 users visited in the last hour