I am getting the following error using cuffmerge (2.2.1):
[Mon Apr 18 07:07:41 2016] Beginning transcriptome assembly merge ------------------------------------------- [Mon Apr 18 07:07:41 2016] Preparing output location cuffmerge/ [Mon Apr 18 07:07:57 2016] Converting GTF files to SAM [07:07:57] Loading reference annotation. GFF Error: duplicate/invalid 'transcript' feature ID=id102945 [FAILED] Error: could not execute gtf_to_sam
The reference GFF came from NCBI. Here's what I get if I grep for "ID=id102945":
NW_015493306.1 Gnomon C_gene_segment 47653 72466 . - . ID=id102945;Parent=gene4402;Dbxref=GeneID:107506276;gbkey=C_region;gene=LOC107506276;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 2 samples with support for all annotated introns
Any suggestions? I can't seem to find anything. I didn't have any issues when running tophat or cufflinks using the same GFF.
For what it is worth, I am unable to validate the file using the GFF validator at genometools.org (with the Seq Ontology option selected):
Validation unsuccessful! GenomeTools error: the child feature with type 'V_gene_segment' on line 17186 in file "/var/www/servers/genometools.org/htdocs/cgi-bin/gff3/_9407.gff3.gz" is not part-of parent feature with type 'gene' given on line 17185 (according to type checker 'OBO file /home/satta/genometools_for_web/gtdata/obo_files/so.obo')
EDIT: Still no luck, the "filtered" GTF cause tophat to present errors. I found a different GTF available for the same genome, NCBI has several. However, I am still getting an error about the same entry: [Fri Apr 22 07:49:38 2016] Converting GTF files to SAM
[07:49:38] Loading reference annotation. GFF Error: duplicate/invalid 'transcript' feature ID=id102945 [FAILED] Error: could not execute gtf_to_sam
The entry in question (new GFF):
NW_015493310.1 Gnomon exon 2165474 2165578 . + . ID=id102945;Parent=rna8992;Dbxref=GeneID:107506309,Genbank:XM_016136996.1;gbkey=mRNA;gene=REV3L;product=REV3 like%2C DNA directed polymerase zeta catalytic subunit%2C transcript variant X3;transcript_id=XM_016136996.1
What I am having trouble understanding is that this error happens if I supply cuffmerge a reference sequence, reference GFF, both or nothing. So I'm assuming it is a problem with the output of cufflinks.