Question: Cuffmerge GFF Error: duplicate/invalid 'transcript'
gravatar for mgoldste
2.1 years ago by
mgoldste0 wrote:

Hi everyone, I am trying to run cuffmerge with the catfish genome from, with the gff file but I keep getting the same error. I have tried converting the gff to gtf, but this error message keeps popping up:

GFF Error: duplicate/invalid 'transcript' feature ID=id47527 [FAILED] Error: could not execute gtf_to_sam

Can anyone point me to a solution?

rna-seq genome • 1.2k views
ADD COMMENTlink modified 2.1 years ago by Kevin Blighe55k • written 2.1 years ago by mgoldste0
gravatar for Kevin Blighe
2.1 years ago by
Kevin Blighe55k
Kevin Blighe55k wrote:

This error has been reported in multiple places. On the GitHub thread, one of the developers has provided assistance: Cuffmerge GFF Error: duplicate/invalid 'transcript' #77.

I, in addition, suggest that you upgrade to using HISAT2 and StringTie, which are upgrades of TopHat2 / Cufflinks. This in itself may solve the issue.

Finally, why not explore your GFF by using grep to extract the features with ID id47527 and then try to understand why the error may have occurred. Become your own investigator.



Curiosity got the better of me and here are the entries for this:

NC_030417.1 Gnomon  C_gene_segment  22233544    22237573    .   -   .   ID=id47527;Parent=gene2052;Dbxref=GeneID:108277590
NC_030417.1 Gnomon  exon    22237278    22237573    .   -   .   ID=id47528;Parent=id47527;Dbxref=GeneID:108277590
NC_030417.1 Gnomon  exon    22236835    22237158    .   -   .   ID=id47529;Parent=id47527;Dbxref=GeneID:108277590
NC_030417.1 Gnomon  exon    22235738    22236055    .   -   .   ID=id47530;Parent=id47527;Dbxref=GeneID:108277590
NC_030417.1 Gnomon  exon    22234411    22234698    .   -   .   ID=id47531;Parent=id47527;Dbxref=GeneID:108277590
NC_030417.1 Gnomon  exon    22233544    22234110    .   -   .   ID=id47532;Parent=id47527;Dbxref=GeneID:108277590

It does not look like a typical entry. There's also a note written into the GFF for this transcript:

Note=The sequence of the model RefSeq transcript was modified relative to this genomic sequence to represent the inferred CDS: added 477 bases not found in genome assembly;exception=annotated by transcript or proteomic data;

I'm imagining that it's an antibody gene, part of the constant (C) region.

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Kevin Blighe55k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1707 users visited in the last hour