How To Fix A Tophat "Glist Error That Is Likely Caused By Incorrect Sequence Naming
1
0
Entering edit mode
11.0 years ago
Jirapong ▴ 20

I'm trying to map sample against Xenopus Laevis from Xenbase.org (latest version 6.0). They provided GFF3 and FASTA file. which look like following

FASTA

> 27051543
ATGGCGGATGTGAAGGTCTCGTTCCAGTGCCCAGGCCGGATGTACAGCCCCGCGTGGGTGGCACCTGAGGCGCTGCAGAA
ACGCCCAGAGGATATTAACCGTCGCTCTGCTGACATGTGGAGTTTTGCCGTTCTGCTTTGGGAGCTGGTGACCCGCGAGG
TTCCATTTGCCGACCTCTCAAACATGGAGATTGGCATGAAGGTTTCCCTTGAAGGCCTCCGTCCCACCATCCCCCCCGGG
ATCTCGCCCCATATCTGCAAGTTGATGAAGATTTGTATGAACGAAGACCCTGCCAAGCGACCCAAGTTTGATATGATCGC
CCCCATCCTGGAGAAGATGCAGGAGAAATAA
> 27051545
TTTGGACTGTGCGTGAATTTAAAGAAAGCAGACAAATTCTTCCCGCGTTGCTATAACCTGGCGGATAAAACAGGGAGAAT
GTTATTCACTGATGACTTCATGAAAACTGCAGCGTATAGTATCATAAAATGGGTTGTAACAAGAAACAGTACGCCTATTA
AAGCAGAAGCCAATGTAATTTTAATGGCTTTTATGGTCTGCAAAATGTTCATGATTCCCTCAGTAAATAAGGACATAGAC

GFF3

##gff-version 3
Scaffold100041    JGI_gene    gene    2092    20066    .    +    .    ID=XeXenL6RMv10000001m.g;Name=XeXenL6RMv10000001m.g
Scaffold100041    JGI_gene    mRNA    2092    20066    .    +    .    ID=PAC:27060736;Name=XeXenL6RMv10000001m;pacid=27060736;longest=1;Parent=XeXenL6RMv10000001m.g
Scaffold100041    JGI_gene    five_prime_UTR    2092    2223    .    +    .    ID=PAC:27060736.five_prime_UTR.1;Parent=PAC:27060736;pacid=27060736
Scaffold100041    JGI_gene    five_prime_UTR    2490    2505    .    +    .    ID=PAC:27060736.five_prime_UTR.2;Parent=PAC:27060736;pacid=27060736
Scaffold100041    JGI_gene    CDS    2506    2585    .    +    0    ID=PAC:27060736.CDS.1;Parent=PAC:27060736;pacid=27060736
Scaffold100041    JGI_gene    CDS    4114    4216    .    +    1    ID=PAC:27060736.CDS.2;Parent=PAC:27060736;pacid=27060736
Scaffold100041    JGI_gene    CDS    4370    4449    .    +    0    ID=PAC:27060736.CDS.3;Parent=PAC:27060736;pacid=27060736
Scaffold100041    JGI_gene    CDS    6233    6422    .    +    1    ID=PAC:27060736.CDS.4;Parent=PAC:27060736;pacid=27060736
Scaffold100041    JGI_gene    CDS    7542    7700    .    +    0    ID=PAC:27060736.CDS.5;Parent=PAC:27060736;pacid=27060736

So the GFF3 use PAC:XXXXXXX as the ID however, the FASTA didn't. On Tophat2 mapping process

/bin/map2gtf --sam-header ./tophat_out/tmp/Scaffold10.nucleotide_genome.bwt.samheader.sam /tmp/Simbiot_HSS/index/Scaffold10.nucleotide.gff - ./tophat_out/tmp/left_kept_reads.m2g.bam

Error is

[samopen] SAM header is present: 43025 sequences.
GList error (GList.hh:981):Invalid list index: 27078510

when i tried to convert to GTF. it have following error

Can't locate object method "display_text" via package "Bio::Annotation::SimpleValue" at /usr/local/share/perl5/Bio/SeqFeature/Annotated.pm line 703, <GEN0> line 2.

The convert code looks like this

#! /usr/bin/perl

use lib '/local/ensembl/bioperl-live';

use warnings;
use Bio::FeatureIO;

$in  = Bio::FeatureIO->new(-file => "/tmp/Simbiot_HSS/index/Scaffold10.nucleotide.gff3" , -format => 'GFF');

$out = Bio::FeatureIO->new(-file    => ">/tmp/Simbiot_HSS/index/test.gtf" ,
                             -format  => 'GTF');


while ( my $feature = $in->next_feature() ) {
    $out->write_feature($feature);
}

exit(0);

Is i missing something?

gff fasta tophat2 • 2.9k views
ADD COMMENT
0
Entering edit mode
11.0 years ago

First and foremost please note that a GTF file is not the same as a GFF file, so that is one possible problem.

Then if all you need to transform a file from GFF to GTF while removing the PAC prefix then you should post a question on just that not even mentioning Tophat.

To avoid delays the best would be to make sure that if you had a GTF file with the correct names the process would work. For that create a copy containing only the first few lines of the GFF and edit them manually to be a GTF file with the correct names. Run the pipeline on this data.

ADD COMMENT
0
Entering edit mode

@Istvan Thank you very much. tophat itself pick GFF file. I only provide the prefix path like "/tmp/Simbiot_HSS/index/Scaffold10.nucleotide" then it auto pickup gff (may be version 2 or 3). I did also try to convert that GFF3 to GTF but got error. see above.

ADD REPLY

Login before adding your answer.

Traffic: 3195 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6