15 months ago by
Guy's Hospital, London
This error has been reported in multiple places. On the GitHub thread, one of the developers has provided assistance: Cuffmerge GFF Error: duplicate/invalid 'transcript' #77.
I, in addition, suggest that you upgrade to using HISAT2 and StringTie, which are upgrades of TopHat2 / Cufflinks. This in itself may solve the issue.
Finally, why not explore your GFF by using grep to extract the features with ID id47527 and then try to understand why the error may have occurred. Become your own investigator.
Curiosity got the better of me and here are the entries for this:
NC_030417.1 Gnomon C_gene_segment 22233544 22237573 . - . ID=id47527;Parent=gene2052;Dbxref=GeneID:108277590
NC_030417.1 Gnomon exon 22237278 22237573 . - . ID=id47528;Parent=id47527;Dbxref=GeneID:108277590
NC_030417.1 Gnomon exon 22236835 22237158 . - . ID=id47529;Parent=id47527;Dbxref=GeneID:108277590
NC_030417.1 Gnomon exon 22235738 22236055 . - . ID=id47530;Parent=id47527;Dbxref=GeneID:108277590
NC_030417.1 Gnomon exon 22234411 22234698 . - . ID=id47531;Parent=id47527;Dbxref=GeneID:108277590
NC_030417.1 Gnomon exon 22233544 22234110 . - . ID=id47532;Parent=id47527;Dbxref=GeneID:108277590
It does not look like a typical entry. There's also a note written into the GFF for this transcript:
Note=The sequence of the model RefSeq transcript was modified relative
to this genomic sequence to represent the inferred CDS: added 477
bases not found in genome assembly;exception=annotated by transcript
or proteomic data;
I'm imagining that it's an antibody gene, part of the constant (C) region.