Question: How To Fix A Tophat "Glist Error That Is Likely Caused By Incorrect Sequence Naming
gravatar for Jirapong
7.5 years ago by
Chiang Mai, Thailand
Jirapong20 wrote:

I'm trying to map sample against Xenopus Laevis from (latest version 6.0). They provided GFF3 and FASTA file. which look like following


> 27051543
> 27051545


##gff-version 3
Scaffold100041    JGI_gene    gene    2092    20066    .    +    .    ID=XeXenL6RMv10000001m.g;Name=XeXenL6RMv10000001m.g
Scaffold100041    JGI_gene    mRNA    2092    20066    .    +    .    ID=PAC:27060736;Name=XeXenL6RMv10000001m;pacid=27060736;longest=1;Parent=XeXenL6RMv10000001m.g
Scaffold100041    JGI_gene    five_prime_UTR    2092    2223    .    +    .    ID=PAC:27060736.five_prime_UTR.1;Parent=PAC:27060736;pacid=27060736
Scaffold100041    JGI_gene    five_prime_UTR    2490    2505    .    +    .    ID=PAC:27060736.five_prime_UTR.2;Parent=PAC:27060736;pacid=27060736
Scaffold100041    JGI_gene    CDS    2506    2585    .    +    0    ID=PAC:27060736.CDS.1;Parent=PAC:27060736;pacid=27060736
Scaffold100041    JGI_gene    CDS    4114    4216    .    +    1    ID=PAC:27060736.CDS.2;Parent=PAC:27060736;pacid=27060736
Scaffold100041    JGI_gene    CDS    4370    4449    .    +    0    ID=PAC:27060736.CDS.3;Parent=PAC:27060736;pacid=27060736
Scaffold100041    JGI_gene    CDS    6233    6422    .    +    1    ID=PAC:27060736.CDS.4;Parent=PAC:27060736;pacid=27060736
Scaffold100041    JGI_gene    CDS    7542    7700    .    +    0    ID=PAC:27060736.CDS.5;Parent=PAC:27060736;pacid=27060736

So the GFF3 use PAC:XXXXXXX as the ID however, the FASTA didn't. On Tophat2 mapping process

/bin/map2gtf --sam-header ./tophat_out/tmp/Scaffold10.nucleotide_genome.bwt.samheader.sam /tmp/Simbiot_HSS/index/Scaffold10.nucleotide.gff - ./tophat_out/tmp/left_kept_reads.m2g.bam

Error is

[samopen] SAM header is present: 43025 sequences.
GList error (GList.hh:981):Invalid list index: 27078510

when i tried to convert to GTF. it have following error

Can't locate object method "display_text" via package "Bio::Annotation::SimpleValue" at /usr/local/share/perl5/Bio/SeqFeature/ line 703, <GEN0> line 2.

The convert code looks like this

#! /usr/bin/perl

use lib '/local/ensembl/bioperl-live';

use warnings;
use Bio::FeatureIO;

$in  = Bio::FeatureIO->new(-file => "/tmp/Simbiot_HSS/index/Scaffold10.nucleotide.gff3" , -format => 'GFF');

$out = Bio::FeatureIO->new(-file    => ">/tmp/Simbiot_HSS/index/test.gtf" ,
                             -format  => 'GTF');

while ( my $feature = $in->next_feature() ) {


Is i missing something?

fasta gff tophat2 • 2.3k views
ADD COMMENTlink modified 7.4 years ago • written 7.5 years ago by Jirapong20
gravatar for Istvan Albert
7.4 years ago by
Istvan Albert ♦♦ 85k
University Park, USA
Istvan Albert ♦♦ 85k wrote:

First and foremost please note that a GTF file is not the same as a GFF file, so that is one possible problem.

Then if all you need to transform a file from GFF to GTF while removing the PAC prefix then you should post a question on just that not even mentioning Tophat.

To avoid delays the best would be to make sure that if you had a GTF file with the correct names the process would work. For that create a copy containing only the first few lines of the GFF and edit them manually to be a GTF file with the correct names. Run the pipeline on this data.

ADD COMMENTlink written 7.4 years ago by Istvan Albert ♦♦ 85k

@Istvan Thank you very much. tophat itself pick GFF file. I only provide the prefix path like "/tmp/Simbiot_HSS/index/Scaffold10.nucleotide" then it auto pickup gff (may be version 2 or 3). I did also try to convert that GFF3 to GTF but got error. see above.

ADD REPLYlink modified 7.4 years ago • written 7.4 years ago by Jirapong20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1167 users visited in the last hour