I have reached the StringTie section of the nextflow RNA-seq pipeline and I keep receiving this error message.
Command error:
Running StringTie 2.2.1. Command line:
stringtie sample1.sorted.bam --fr -G reference_genomic.gtf -o sample1.transcripts.gtf -A sample1.gene.abundance.txt -C sample1.coverage.gtf -b sample1.ballgown -p 4 -v -e
Loading reference annotation (guides)..
Error: no valid ID found for GFF record
I am not too familiar with gff files, so any any help would be really appreciated. I have tried using AGAT (as suggested in other posts). If sorting is the issue, I cannot use gff3sort as I do not have root access.
I'm not convinced sorting is the issue anyway, as nextflow generated the gff from my gtf file - does anyone have any suggestions?
Thanks in advance. Here is a sample of my gff.
ABKE04000044.1 Genbank gene 11 11446 . + . ID=nbis-gene-28068;gbkey=Gene;gene_biotype=protein_coding;gene_id=PRIPAC_91469;locus_tag=PRIPAC_91469;partial=true
ABKE04000044.1 Genbank mRNA 11 11446 . + . ID=gnl|WGS:ABKE|PRIPAC_mrna91469;Parent=nbis-gene-28068;gbkey=Gene;gene_biotype=protein_coding;gene_id=PRIPAC_91469;locus_tag=PRIPAC_91469;partial=true;transcript_id=""
ABKE04000044.1 Genbank exon 11 59 . + . ID=nbis-exon-322458;Parent=gnl|WGS:ABKE|PRIPAC_mrna91469;exon_number=1;gene_id=PRIPAC_91469;locus_tag=PRIPAC_91469;orig_protein_id=gnl|WGS:ABKE|PRIPAC_91469;orig_transcript_id=gnl|WGS:ABKE|PRIPAC_mrna91469;partial=true;product=hypothetical protein;transcript_biotype=mRNA;transcript_id=gnl|WGS:ABKE|PRIPAC_mrna91469
ABKE04000044.1 Genbank exon 133 252 . + . ID=nbis-exon-322459;Parent=gnl|WGS:ABKE|PRIPAC_mrna91469;exon_number=2;gene_id=PRIPAC_91469;locus_tag=PRIPAC_91469;orig_protein_id=gnl|WGS:ABKE|PRIPAC_91469;orig_transcript_id=gnl|WGS:ABKE|PRIPAC_mrna91469;partial=true;product=hypothetical protein;transcript_biotype=mRNA;transcript_id=gnl|WGS:ABKE|PRIPAC_mrna91469
UPDATE
We have found the .gff file generated from the .gtf was inaccurate. We are troubleshooting with this new information - if it works, I will post the solution.
You could try to convert the file into GTF and try again. I'm wondering if Stringtie is not confused because you use the
gtfextension while usinggxf(ID/Parent relationship from GFF + gene_id transcript_id relationship from GTF). In GTF the gene_id attribute should come first in the 9th column, so it might be the issue, here it is not the case.