Question: GTF files for cufflinks
0
gravatar for Adrian Pelin
4.6 years ago by
Adrian Pelin2.2k
Canada
Adrian Pelin2.2k wrote:

Hello,

I need to take JGI annotation in a GFF format and somehow feed it to cufflinks. This is the format the annotation is in:

scaffold_1      JGI     exon    1066    1206    .       -       .       name "gm1.1_g"; transcriptId 113
scaffold_1      JGI     CDS     1066    1206    .       -       0       name "gm1.1_g"; proteinId 1; exonNumber 3
scaffold_1      JGI     stop_codon      1066    1068    .       -       0       name "gm1.1_g"
scaffold_1      JGI     exon    1304    1600    .       -       .       name "gm1.1_g"; transcriptId 113
scaffold_1      JGI     CDS     1304    1600    .       -       0       name "gm1.1_g"; proteinId 1; exonNumber 2
scaffold_1      JGI     exon    1675    2220    .       -       .       name "gm1.1_g"; transcriptId 113
scaffold_1      JGI     CDS     1675    2220    .       -       0       name "gm1.1_g"; proteinId 1; exonNumber 1
scaffold_1      JGI     start_codon     2218    2220    .       -       0       name "gm1.1_g"
scaffold_10     JGI     exon    11      345     .       +       .       name "gm1.12_g"; transcriptId 124
scaffold_10     JGI     CDS     11      345     .       +       0       name "gm1.12_g"; proteinId 12; exonNumber 1
scaffold_10     JGI     start_codon     11      13      .       +       0       name "gm1.12_g"
scaffold_10     JGI     exon    470     512     .       +       .       name "gm1.12_g"; transcriptId 124
scaffold_10     JGI     CDS     470     512     .       +       1       name "gm1.12_g"; proteinId 12; exonNumber 2
scaffold_10     JGI     stop_codon      510     512     .       +       0       name "gm1.12_g"

 

When I feed it to cufflinks, I get Seg. Fault. No idea why, no error message. I tried changing it to GTF format, and this is what I got:

scaffold_1    JGI    transcript    1066    1206    .    -    .    gene_id "gm1.1_g";
scaffold_1    JGI    transcript    1304    1600    .    -    .    gene_id "gm1.1_g";
scaffold_1    JGI    transcript    1675    2220    .    -    .    gene_id "gm1.1_g";
scaffold_1    JGI    transcript    769    891    .    -    .    gene_id "fgenesh1_pg.1_";
scaffold_1    JGI    transcript    942    1031    .    -    .    gene_id "fgenesh1_pg.1_";
scaffold_1    JGI    transcript    1304    1469    .    -    .    gene_id "fgenesh1_pg.1_";
scaffold_1    JGI    transcript    1802    2220    .    -    .    gene_id "fgenesh1_pg.1_";
scaffold_1    JGI    transcript    1244    1603    .    -    .    gene_id "fgenesh1_kg.1_";
scaffold_1    JGI    transcript    1675    1950    .    -    .    gene_id "fgenesh1_kg.1_";
scaffold_1    JGI    transcript    1588    2220    .    -    .    gene_id "fgenesh1_kg.1_";
scaffold_1    JGI    transcript    1588    2220    .    -    .    gene_id "fgenesh1_kg.1_";
scaffold_2    JGI    transcript    1099    2094    .    -    .    gene_id "gm1.2_g";
scaffold_2    JGI    transcript    4803    5465    .    +    .    gene_id "gm1.3_g";
scaffold_2    JGI    transcript    1099    2094    .    -    .    gene_id "fgenesh1_pg.2_";
scaffold_2    JGI    transcript    3422    3525    .    -    .    gene_id "fgenesh1_pg.2_";
scaffold_2    JGI    transcript    3594    3675    .    -    .    gene_id "fgenesh1_pg.2_";
scaffold_2    JGI    transcript    4142    4196    .    -    .    gene_id "fgenesh1_pg.2_";
scaffold_2    JGI    transcript    4788    4878    .    -    .    gene_id "fgenesh1_pg.2_";
scaffold_2    JGI    transcript    5241    5274    .    -    .    gene_id "fgenesh1_pg.2_";
scaffold_2    JGI    transcript    1099    2094    .    -    .    gene_id "CE84695_65";
scaffold_2    JGI    transcript    1443    1631    .    -    .    gene_id "fgenesh1_kg.2_";
scaffold_2    JGI    transcript    1644    2094    .    -    .    gene_id "fgenesh1_kg.2_";
scaffold_2    JGI    transcript    1443    1631    .    -    .    gene_id "fgenesh1_kg.2_";
scaffold_2    JGI    transcript    1644    2094    .    -    .    gene_id "fgenesh1_kg.2_";

And cufflinks doesn't crash anymore with the above file, the problem is that it processed 0 loci:

[13:44:59] Loading reference annotation.
[13:45:00] Inspecting reads and determining fragment length distribution.
> Processed 0 loci.                            [*************************] 100%
> Map Properties:
>    Normalized Map Mass: 17691349.00
>    Raw Map Mass: 17691349.00
>    Fragment Length Distribution: Truncated Gaussian (default)
>                  Default Mean: 200
>               Default Std Dev: 80
[13:45:29] Estimating transcript abundances.
> Processed 0 loci.                            [*************************] 100%

Any idea how I can tweak this so that it works with cufflinks?

Thanks, Adrian

jgi cufflinks annotation gtf • 1.4k views
ADD COMMENTlink modified 4.6 years ago by David Fredman980 • written 4.6 years ago by Adrian Pelin2.2k

If you can make a small reproducible example of the segfault, then it'd be good to submit a bug report so that simply gets fixed.

ADD REPLYlink written 4.6 years ago by Devon Ryan88k
2
gravatar for David Fredman
4.6 years ago by
David Fredman980
University of Bergen, Norway
David Fredman980 wrote:

As you've discovered JGI format is similar to, but not quite GTF (hooray for yet another "standard" format in bioinformatics!). The closest format that cufflinks parses correctly is GTF 2.2, as described here: http://mblab.wustl.edu/GTF22.html. Your second file looks quite close to that, but not quite.

I had this issue a long time ago, and solved it by generating a bed file from a database dump, then bed->genePred->gtf using the UCSC Kent toolkit... not ideal.

Edit: I had a quick look and it looks like Jason Stajich has done more than his fair share of gff conversion - I suggest you try his gtf2gff3_3level.pl script, looks quite solid https://github.com/hyphaltip/genome-scripts/blob/master/data_format/gtf2gff3_3level.pl

Old: I also wrote this Perl script which I recovered from my trove that you could try (but not able to testdrive it this minute, so no guarantees).

HTH

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by David Fredman980

Relooking at the example lines, you're absolutely right. Given how trivially JGI could make their files standards conforming I'm amazed this hasn't been done.

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by Devon Ryan88k

Thanks! The perl script you provided works great as-is. Ran cufflinks and it correctly identified loci. I will look further into the results, but this looks very promising. Thank you again for your help.

ADD REPLYlink written 4.6 years ago by Adrian Pelin2.2k

great, you're welcome!

ADD REPLYlink written 4.6 years ago by David Fredman980
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1264 users visited in the last hour