Question

How To Fix A Tophat "Glist Error That Is Likely Caused By Incorrect Sequence Naming

0

Entering edit mode

11.2 years ago

Jirapong ▴ 30

I'm trying to map sample against Xenopus Laevis from Xenbase.org (latest version 6.0). They provided GFF3 and FASTA file. which look like following

FASTA

> 27051543
ATGGCGGATGTGAAGGTCTCGTTCCAGTGCCCAGGCCGGATGTACAGCCCCGCGTGGGTGGCACCTGAGGCGCTGCAGAA
ACGCCCAGAGGATATTAACCGTCGCTCTGCTGACATGTGGAGTTTTGCCGTTCTGCTTTGGGAGCTGGTGACCCGCGAGG
TTCCATTTGCCGACCTCTCAAACATGGAGATTGGCATGAAGGTTTCCCTTGAAGGCCTCCGTCCCACCATCCCCCCCGGG
ATCTCGCCCCATATCTGCAAGTTGATGAAGATTTGTATGAACGAAGACCCTGCCAAGCGACCCAAGTTTGATATGATCGC
CCCCATCCTGGAGAAGATGCAGGAGAAATAA
> 27051545
TTTGGACTGTGCGTGAATTTAAAGAAAGCAGACAAATTCTTCCCGCGTTGCTATAACCTGGCGGATAAAACAGGGAGAAT
GTTATTCACTGATGACTTCATGAAAACTGCAGCGTATAGTATCATAAAATGGGTTGTAACAAGAAACAGTACGCCTATTA
AAGCAGAAGCCAATGTAATTTTAATGGCTTTTATGGTCTGCAAAATGTTCATGATTCCCTCAGTAAATAAGGACATAGAC

GFF3

##gff-version 3
Scaffold100041    JGI_gene    gene    2092    20066    .    +    .    ID=XeXenL6RMv10000001m.g;Name=XeXenL6RMv10000001m.g
Scaffold100041    JGI_gene    mRNA    2092    20066    .    +    .    ID=PAC:27060736;Name=XeXenL6RMv10000001m;pacid=27060736;longest=1;Parent=XeXenL6RMv10000001m.g
Scaffold100041    JGI_gene    five_prime_UTR    2092    2223    .    +    .    ID=PAC:27060736.five_prime_UTR.1;Parent=PAC:27060736;pacid=27060736
Scaffold100041    JGI_gene    five_prime_UTR    2490    2505    .    +    .    ID=PAC:27060736.five_prime_UTR.2;Parent=PAC:27060736;pacid=27060736
Scaffold100041    JGI_gene    CDS    2506    2585    .    +    0    ID=PAC:27060736.CDS.1;Parent=PAC:27060736;pacid=27060736
Scaffold100041    JGI_gene    CDS    4114    4216    .    +    1    ID=PAC:27060736.CDS.2;Parent=PAC:27060736;pacid=27060736
Scaffold100041    JGI_gene    CDS    4370    4449    .    +    0    ID=PAC:27060736.CDS.3;Parent=PAC:27060736;pacid=27060736
Scaffold100041    JGI_gene    CDS    6233    6422    .    +    1    ID=PAC:27060736.CDS.4;Parent=PAC:27060736;pacid=27060736
Scaffold100041    JGI_gene    CDS    7542    7700    .    +    0    ID=PAC:27060736.CDS.5;Parent=PAC:27060736;pacid=27060736

So the GFF3 use PAC:XXXXXXX as the ID however, the FASTA didn't. On Tophat2 mapping process

/bin/map2gtf --sam-header ./tophat_out/tmp/Scaffold10.nucleotide_genome.bwt.samheader.sam /tmp/Simbiot_HSS/index/Scaffold10.nucleotide.gff - ./tophat_out/tmp/left_kept_reads.m2g.bam

Error is

[samopen] SAM header is present: 43025 sequences.
GList error (GList.hh:981):Invalid list index: 27078510

when i tried to convert to GTF. it have following error

Can't locate object method "display_text" via package "Bio::Annotation::SimpleValue" at /usr/local/share/perl5/Bio/SeqFeature/Annotated.pm line 703, <GEN0> line 2.

The convert code looks like this

#! /usr/bin/perl

use lib '/local/ensembl/bioperl-live';

use warnings;
use Bio::FeatureIO;

$in  = Bio::FeatureIO->new(-file => "/tmp/Simbiot_HSS/index/Scaffold10.nucleotide.gff3" , -format => 'GFF');

$out = Bio::FeatureIO->new(-file    => ">/tmp/Simbiot_HSS/index/test.gtf" ,
                             -format  => 'GTF');


while ( my $feature = $in->next_feature() ) {
    $out->write_feature($feature);
}

exit(0);

Is i missing something?

gff fasta tophat2 • 3.0k views

ADD COMMENT • link 11.2 years ago by Jirapong ▴ 30

score 0 · Answer 1 · 2013-05-15

0

Entering edit mode

11.2 years ago

Istvan Albert 101k

First and foremost please note that a GTF file is not the same as a GFF file, so that is one possible problem.

Then if all you need to transform a file from GFF to GTF while removing the PAC prefix then you should post a question on just that not even mentioning Tophat.

To avoid delays the best would be to make sure that if you had a GTF file with the correct names the process would work. For that create a copy containing only the first few lines of the GFF and edit them manually to be a GTF file with the correct names. Run the pipeline on this data.

ADD COMMENT • link 11.2 years ago by Istvan Albert 101k

0

Entering edit mode

@Istvan Thank you very much. tophat itself pick GFF file. I only provide the prefix path like "/tmp/Simbiot_HSS/index/Scaffold10.nucleotide" then it auto pickup gff (may be version 2 or 3). I did also try to convert that GFF3 to GTF but got error. see above.

ADD REPLY • link 11.2 years ago by Jirapong ▴ 30