Question: Error When Reading Tigr Format Xml With Bioperl
1
gravatar for Daniel
5.0 years ago by
Daniel3.6k
Cardiff University
Daniel3.6k wrote:

I am trying to read in the latest arabidopsis genome so I can search it for DNA motifs. It comes in tigr xml format but there is a line in the ftp site's README.txt saying:

1) We do not have anticodon data available in all cases.
2) We have added element <BAC>.
3) We have changed the definition of non-coding RNAs to include exons and splice variants
Thus, validation against the TIGR DTD can fail.
Please use the "tairxml.dtd" file for validation.

According to the Bioperl SeqIO for tigrxml documentation, the format should work but I'm getting the error:

 ./motif_hit ch2.xml

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: [1]Unknown or Invalid process directive:<?xml version="1.0" standalone="yes"?>
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:472
STACK: Bio::SeqIO::tigr::throw /usr/share/perl5/Bio/SeqIO/tigr.pm:1352
STACK: Bio::SeqIO::tigr::_process /usr/share/perl5/Bio/SeqIO/tigr.pm:205
STACK: Bio::SeqIO::tigr::_initialize /usr/share/perl5/Bio/SeqIO/tigr.pm:97
STACK: Bio::SeqIO::new /usr/share/perl5/Bio/SeqIO.pm:358
STACK: Bio::SeqIO::new /usr/share/perl5/Bio/SeqIO.pm:397
STACK: ./motif_hit:24
-----------------------------------------------------------

Is the tairxml.dtd causing the error and if so, how? I don't understand how to validate as it says.

My tester code:

#!/usr/bin/perl
use strict;
use warnings;

use Bio::Seq;
use Bio::SeqIO;
use Bio::Tools::IUPAC_v2;

my $usage = "USAGE:\t./motif_hit INFILE.xml <motif>";

my $infile = $ARGV[0] or die $usage . "\n";

my $in = Bio::SeqIO->new(-file => "$infile", -format => 'tigr');

while ( my $seq = $in->next_seq() ){
        print $seq->display_id;
}
xml bioperl • 2.5k views
ADD COMMENTlink modified 5.0 years ago by Istvan Albert ♦♦ 77k • written 5.0 years ago by Daniel3.6k

try to run xmllint ch2.xml and see if there is an error.

ADD REPLYlink written 5.0 years ago by Pierre Lindenbaum112k

I just got: "ch2.xml:1232431: error: xmlSAX2Characters: huge text node: out of memory". I think that means it's reading it right.

ADD REPLYlink written 5.0 years ago by Daniel3.6k

no that does not mean that the file is correct. SAX parsing operates on streams and will stop on error before finishing reading the file

ADD REPLYlink written 5.0 years ago by Istvan Albert ♦♦ 77k

So does that mean the download corrupted? Sorry for the basic questions. EDIT: It appears the whole genome is in the final xml tags, which I assume must be the reason it's giving the "huge text node: out of memory" message. Is this not accommodated for though?

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by Daniel3.6k

that makes sense, that certainly qualifies for a huge node ..

ADD REPLYlink written 5.0 years ago by Istvan Albert ♦♦ 77k
0
gravatar for Istvan Albert
5.0 years ago by
Istvan Albert ♦♦ 77k
University Park, USA
Istvan Albert ♦♦ 77k wrote:

There may be two different things going on, the error that you get seems to complain about a process directive that usually is listed as the first line of the XML file.

But the xmllint program also seems to raise and error, though that is a different error altogether.

Long story short I think your file is not in the format that the program expects it to be in, and also is not complete.

If the file is not overly large you can try opening it in a browser, that should give you a nicely formatted output and may even indicate the exact location of the error.

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Istvan Albert ♦♦ 77k

from opening it up in a browser, and tailing the file it looks complete in that all the tags close. I think the format is just not expected but as the link I put in the original shows, tigr is defined as an accepted format. I think the "Please use the "tairxml.dtd" file for validation." step is the issue but I dont know where/how to validate. Thanks for the help

ADD REPLYlink written 5.0 years ago by Daniel3.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1154 users visited in the last hour