Question: Cuffmerge GFF Error: duplicate/invalid 'transcript'
0
gravatar for mgoldste
15 months ago by
mgoldste0
mgoldste0 wrote:

Hi everyone, I am trying to run cuffmerge with the catfish genome from https://www.ncbi.nlm.nih.gov/genome/198, with the gff file ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/660/625/GCF_001660625.1_IpCoco_1.2/GCF_001660625.1_IpCoco_1.2_genomic.gff.gz but I keep getting the same error. I have tried converting the gff to gtf, but this error message keeps popping up:

GFF Error: duplicate/invalid 'transcript' feature ID=id47527 [FAILED] Error: could not execute gtf_to_sam

Can anyone point me to a solution?

rna-seq genome • 730 views
ADD COMMENTlink modified 15 months ago by Kevin Blighe41k • written 15 months ago by mgoldste0
1
gravatar for Kevin Blighe
15 months ago by
Kevin Blighe41k
Guy's Hospital, London
Kevin Blighe41k wrote:

This error has been reported in multiple places. On the GitHub thread, one of the developers has provided assistance: Cuffmerge GFF Error: duplicate/invalid 'transcript' #77.

I, in addition, suggest that you upgrade to using HISAT2 and StringTie, which are upgrades of TopHat2 / Cufflinks. This in itself may solve the issue.

Finally, why not explore your GFF by using grep to extract the features with ID id47527 and then try to understand why the error may have occurred. Become your own investigator.

Kevin

-------------------------

Curiosity got the better of me and here are the entries for this:

NC_030417.1 Gnomon  C_gene_segment  22233544    22237573    .   -   .   ID=id47527;Parent=gene2052;Dbxref=GeneID:108277590
NC_030417.1 Gnomon  exon    22237278    22237573    .   -   .   ID=id47528;Parent=id47527;Dbxref=GeneID:108277590
NC_030417.1 Gnomon  exon    22236835    22237158    .   -   .   ID=id47529;Parent=id47527;Dbxref=GeneID:108277590
NC_030417.1 Gnomon  exon    22235738    22236055    .   -   .   ID=id47530;Parent=id47527;Dbxref=GeneID:108277590
NC_030417.1 Gnomon  exon    22234411    22234698    .   -   .   ID=id47531;Parent=id47527;Dbxref=GeneID:108277590
NC_030417.1 Gnomon  exon    22233544    22234110    .   -   .   ID=id47532;Parent=id47527;Dbxref=GeneID:108277590

It does not look like a typical entry. There's also a note written into the GFF for this transcript:

Note=The sequence of the model RefSeq transcript was modified relative to this genomic sequence to represent the inferred CDS: added 477 bases not found in genome assembly;exception=annotated by transcript or proteomic data;

I'm imagining that it's an antibody gene, part of the constant (C) region.

ADD COMMENTlink modified 15 months ago • written 15 months ago by Kevin Blighe41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1121 users visited in the last hour