Question: Fetching Description And Accession Number From A Genbank Format Dna Sequence File Using Biojava
0
gravatar for J.Ashley
6.4 years ago by
J.Ashley10
J.Ashley10 wrote:

Hello everyone

I am trying to fetch the accession number and description from a genbank formatted DNA sequence file. However I keep recieving this error

A Exception Has Occurred During Parsing. 
Please submit the details that follow to biojava-l@biojava.org or post a bug report to http://bugzilla.open-bio.org/ 

Format_object=org.biojavax.bio.seq.io.GenbankFormat
Accession=null
Id=null
Comments=Bad section
Parse_block=
Stack trace follows ....


    at org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:603)
    at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:278)
    at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
    ... 5 more
Caused by: java.lang.NullPointerException
    at org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:570)
    ... 7 more
org.biojava.bio.BioException: Could not read sequence
    at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
    at org.biojavax.bio.seq.io.RichStreamReader.nextSequence(RichStreamReader.java:92)
    at org.biojavax.bio.seq.io.RichStreamWriter.writeStream(RichStreamWriter.java:66)
    at org.biojavax.bio.seq.RichSequence$IOTools.writeFasta(RichSequence.java:1558)
    at org.biojavax.bio.seq.RichSequence$IOTools.writeFasta(RichSequence.java:1581)
    at hmwktest.main(hmwktest.java:40)
Caused by: org.biojava.bio.seq.io.ParseException:

Here is the code below

import org.biojava.bio.*; 
import org.biojava.bio.seq.io.*;
import org.biojava.bio.seq.*;
import org.biojavax.Namespace;
import org.biojavax.RichObjectFactory;
import org.biojavax.bio.BioEntry;
import org.biojavax.bio.seq.RichSequence;
import org.biojavax.bio.seq.RichSequence.IOTools;
import org.biojavax.bio.seq.RichSequenceIterator;

import java.io.*; 
import java.util.*; 
import javax.swing.JFileChooser;

public class test {
    private static JFileChooser ourChooser = new JFileChooser("."); 
    /** * Open a file through a FileChooser */ 
    public static BufferedReader openFile(){ 
        int retval = ourChooser.showOpenDialog(null); 
        BufferedReader br = null; if (retval == JFileChooser.APPROVE_OPTION)
        { 
            File file = ourChooser.getSelectedFile(); 
            try { br = new BufferedReader(new FileReader(file)); 
            } 
            catch (FileNotFoundException e) 
            { System.out.println("trouble reading "+file.getName());
            e.printStackTrace(); } } return br; 
            }          

     public static void main(String[] args) 
             throws
             BioException, IOException{ BufferedReader br = openFile(); 
             RichSequenceIterator it = IOTools.readFastaDNA(br, null);
             int count = 0;
             Namespace ns= RichObjectFactory.getDefaultNamespace();
             while (it.hasNext()){
                 count++; 
                 RichSequenceIterator seqs  = RichSequence.IOTools.readGenbankDNA(br, ns);
                 RichSequence.IOTools.writeFasta(System.out,seqs.accession,seqs.description,seqs,ns);
             } 
     }
}
biojava java sequence genbank dna • 2.9k views
ADD COMMENTlink modified 6.4 years ago by Hamish3.1k • written 6.4 years ago by J.Ashley10

why not stand-alone BLAST? blastdbcmd can do this really quick if you have the database downloaded from NCBI.

ADD REPLYlink written 6.4 years ago by arnstrm1.7k
0
gravatar for Hamish
6.4 years ago by
Hamish3.1k
UK
Hamish3.1k wrote:

From your code you appear to be attempting to read fasta format entries and GenBank format entries from the same file, which may explain your error.

Assuming the input data is in the GenBank format, then the following code will read the file, parse the GenBank entries in to objects, and output the primary accession and description from the entry:

public static void main(String[] args) throws BioException, IOException {
    BufferedReader br = openFile();
    RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(br, null);
    while (seqs.hasNext()) {
        RichSequence seq = seqs.nextRichSequence();
        System.out.println(seq.getAccession());
        System.out.println(seq.getDescription());
    }
}

This has been tested with BioJava 1.7.1, but other versions of legacy BioJava should work as well.

ADD COMMENTlink written 6.4 years ago by Hamish3.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2144 users visited in the last hour