ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
3
2
Entering edit mode
2.2 years ago

Hi,

I am trying to use featureCounts to analyse my RNA-seq data with Apis mellifera. My Code and error are as follows. r

/softwares/subread-2.0.0-source/bin/featureCounts 
-T 16 
-p 
-s 1 
-a /home/axel/arumoyc/alignment/GCF_003254395.2_Amel_HAv3.1_genomic.gtf 
-t exon
 -g gene_id 
-o /home/axel/arumoyc/counts_all/all/count24.txt /home/axel/arumoyc/bamfiles_test/bamfiles/bamfile24/map24Aligned.sorted.out.bam
 2> /home/axel/arumoyc/counts_all/all/count24.screen-output.log

ERROR:

failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is 'geneid' An example of attributes included in your GTF annotation is 'geneid ""; transcriptid "unknowntranscript1"; anticodon "(pos:31..33)"; gbkey "tRNA"; product "tRNA-Glu"; exonnumber "1"; 'The program has to terminate.

The .gtf file was downloaded from NCBI and was not manipulated. Please help me on this error. Thanks in advance.

featureCounts • 6.2k views
ADD COMMENT
0
Entering edit mode

Have you looked at the contents of your provided GTF file? With the -g option on featureCounts you're telling it to look for the identifier you proided. Most likely your GTF file is either missing the identifier you provided or is using a different name.

ADD REPLY
0
Entering edit mode

I see the "gene_id" is present. I also converted the .gtf file into an excel table also and the 9th column is indeed "gene_id". Each column of the .gtf file has 335791 entries.

#gtf-version 2.2
#!genome-build Amel_HAv3.1
#!genome-build-accession NCBI_Assembly:GCF_003254395.2
#!annotation-source NCBI Apis mellifera Annotation Release 104
NC_037638.1     Gnomon  gene    9273    12174   .       -       .       gene_id "LOC551580"; db_xref "BEEBASE:GB42195"; db_xref "GeneID:551580"; gbkey "Gene"; gene "LOC551580"; gene_biotype "protein_coding";
NC_037638.1     Gnomon  exon    11812   12174   .       -       .       gene_id "LOC551580"; transcript_id "XR_001705491.2"; db_xref "GeneID:551580"; gbkey "misc_RNA"; gene "LOC551580"; model_evidence "Supporting evidence includes similarity to: 100% coverage of the annotated genomic feature by RNAseq alignments, including 57 samples with support for all annotated introns"; product "ubiquitin-related modifier 1, transcript variant X3"; exon_number "1";
NC_037638.1     Gnomon  exon    11054   11121   .       -       .       gene_id "LOC551580"; transcript_id "XR_001705491.2"; db_xref "GeneID:551580"; gbkey "misc_RNA"; gene "LOC551580"; model_evidence "Supporting evidence includes similarity to: 100% coverage of the annotated genomic feature by RNAseq alignments, including 57 samples with support for all annotated introns"; product "ubiquitin-related modifier 1, transcript variant X3"; exon_number "2";
NC_037638.1     Gnomon  exon    10913   10994   .       -       .       gene_id "LOC551580"; transcript_id "XR_001705491.2"; db_xref "GeneID:551580"; gbkey "misc_RNA"; gene "LOC551580"; model_evidence "Supporting evidence includes similarity to: 100% coverage of the annotated genomic feature by RNAseq alignments, including 57 samples with support for all annotated introns"; product "ubiquitin-related modifier 1, transcript variant X3"; exon_number "3";
NC_037638.1     Gnomon  exon    9779    9827    .       -       .       gene_id "LOC551580"; transcript_id "XR_001705491.2"; db_xref "GeneID:551580"; gbkey "misc_RNA"; gene "LOC551580"; model_evidence "Supporting evidence includes similarity to: 100% coverage of the annotated genomic feature by RNAseq alignments, including 57 samples with support for all annotated introns"; product "ubiquitin-related modifier 1, transcript variant X3"; exon_number "4";
NC_037638.1     Gnomon  exon    9274    9546    .       -       .       gene_id "LOC551580"; transcript_id "XR_001705491.2"; db_xref "GeneID:551580"; gbkey "misc_RNA"; gene "LOC551580"; model_evidence "Supporting evidence includes similarity to: 100% coverage of the annotated genomic feature by RNAseq alignments, including 57 samples with support for all annotated introns"; product "ubiquitin-related modifier 1, transcript variant X3"; exon_number "5";
NC_037638.1     Gnomon  exon    11579   12174   .       -       .       gene_id "LOC551580"; transcript_id "XR_001705490.2"; db_xref "GeneID:551580"; gbkey "misc_RNA"; gene "LOC551580"; model_evidence "Supporting evidence includes similarity to: 100% coverage of the annotated genomic feature by RNAseq alignments, including 65 samples with support for all annotated introns"; product "ubiquitin-related modifier 1, transcript variant X2"; exon_number "1";
NC_037638.1     Gnomon  exon    11054   11121   .       -       .       gene_id "LOC551580"; transcript_id "XR_001705490.2"; db_xref "GeneID:551580"; gbkey "misc_RNA"; gene "LOC551580"; model_evidence "Supporting evidence includes similarity to: 100% coverage of the annotated genomic feature by RNAseq alignments, including 65 samples with support for all annotated introns"; product "ubiquitin-related modifier 1, transcript variant X2"; exon_number "2";
NC_037638.1     Gnomon  exon    10913   10994   .       -       .       gene_id "LOC551580"; transcript_id "XR_001705490.2"; db_xref "GeneID:551580"; gbkey "misc_RNA"; gene "LOC551580"; model_evidence "Supporting evidence includes similarity to: 100% coverage of the annotated genomic feature by RNAseq alignments, including 65 samples with support for all annotated introns"; product "ubiquitin-related modifier 1, transcript variant X2"; exon_number "3";
ADD REPLY
0
Entering edit mode

I seem to be having the same issue with a .gtf file downloaded from NCBI: GCF_003339765.1_Mmul_10_genomic.gtf

$ head GCF_003339765.1_Mmul_10_genomic.gtf 

#gtf-version 2.2
#!genome-build Mmul_10
#!genome-build-accession NCBI_Assembly:GCF_003339765.1
#!annotation-source NCBI Macaca mulatta Annotation Release 103
NC_041754.1     Gnomon  gene    8796    27366   .       -       .       gene_id "PGBD2"; db_xref "GeneID:114678393"; gbkey "Gene"; gene "PGBD2"; gene_biotype "protein_coding";
NC_041754.1     Gnomon  exon    26570   27366   .       -       .       gene_id "PGBD2"; transcript_id "XM_028848769.1"; db_xref "GeneID:114678393"; gbkey "mRNA"; gene "PGBD2"; model_evidence "Supporting evidence includes similarity to: 36 ESTs, 2 Proteins, 5 long SRA reads, and 89% coverage of the annotated genomic feature by RNAseq alignments, including 69 samples with support for all annotated introns"; product "piggyBac transposable element derived 2"; exon_number "1";
NC_041754.1     Gnomon  exon    13491   13554   .       -       .       gene_id "PGBD2"; transcript_id "XM_028848769.1"; db_xref "GeneID:114678393"; gbkey "mRNA"; gene "PGBD2"; model_evidence "Supporting evidence includes similarity to: 36 ESTs, 2 Proteins, 5 long SRA reads, and 89% coverage of the annotated genomic feature by RNAseq alignments, including 69 samples with support for all annotated introns"; product "piggyBac transposable element derived 2"; exon_number "2";
NC_041754.1     Gnomon  exon    8796    10763   .       -       .       gene_id "PGBD2"; transcript_id "XM_028848769.1"; db_xref "GeneID:114678393"; gbkey "mRNA"; gene "PGBD2"; model_evidence "Supporting evidence includes similarity to: 36 ESTs, 2 Proteins, 5 long SRA reads, and 89% coverage of the annotated genomic feature by RNAseq alignments, including 69 samples with support for all annotated introns"; product "piggyBac transposable element derived 2"; exon_number "3";
NC_041754.1     Gnomon  CDS     13491   13507   .       -       0       gene_id "PGBD2"; transcript_id "XM_028848769.1"; db_xref "GeneID:114678393"; gbkey "CDS"; gene "PGBD2"; product "piggyBac transposable element-derived protein 2"; protein_id "XP_028704602.1"; exon_number "2";
NC_041754.1     Gnomon  CDS     9005    10763   .       -       1       gene_id "PGBD2"; transcript_id "XM_028848769.1"; db_xref "GeneID:114678393"; gbkey "CDS"; gene "PGBD2"; product "piggyBac transposable element-derived protein 2"; protein_id "XP_028704602.1"; exon_number "3";

I am aware from this guidance (https://mblab.wustl.edu/GTF2.html) and this biostars question (GFF3 to GTF conversion - 9th column) that gene_id and transcript_id must be at the start of the 9th column and I tried to correct my .gtf file based on recommendations in the biostar post, however, this hasn't solved the issue.

I have another annotation downloaded from Ensembl and it runs fine however it's not the annotation I want to use.

Further clarification on this would be greatly appreciated and thank you for you time reading this

ADD REPLY
0
Entering edit mode

I was having the same error message. I tried to manipulate my gtf file in many ways. I read someone suggesting to go to an older version of subread and that worked for me. We happened to have subread/1.5.1. Good luck.

ADD REPLY
0
Entering edit mode

You can try AGAT it might fix your problem

ADD REPLY
3
Entering edit mode
21 months ago
Chris S. ▴ 320

featureCounts does not allow empty values in the gene_id field, so you need to remove or update them. See this answer from the developer https://groups.google.com/g/subread/c/xs7mw38Bc6g.

grep 'gene_id ""' GCF_003254395.2_Amel_HAv3.1_genomic.gtf 
grep -v 'gene_id ""' GCF_003254395.2_Amel_HAv3.1_genomic.gtf > Amel_HAv3.1_FIXED.gtf
ADD COMMENT
0
Entering edit mode

Thanks, Chris S. It seemed to work for me. Since I am not well versed with grep, please kindly tell me (us) how did you fix the GTF file? And where from could we learn more about handling this type of issues. Thanks a lot.

ADD REPLY
1
Entering edit mode
11 months ago
rependo ▴ 40

To anyone finding this thread after getting a related "gene_id" error in featureCounts or subread-align, I had this same issue but it was not resolved by clearing empty values following "gene_id" field in column nine (my gtf did not have empty values).

I was, however, able to resolve the issue and get subread to function (subread-align) by removing "exon_number" fields from column 9 that were present in addition to "transcript_id" and "gene_id". So if you're like me and still trying to get your .gtf to play nicely with subread, I'd suggest looking for anything in column 9 that isn't transcript_id or gene_id, then removing it.

ADD COMMENT
0
Entering edit mode
11 months ago
onkar • 0

The best way for me was converting the file into SAF format http://bioinf.wehi.edu.au/featureCounts/

GeneID  Chr Start   End Strand

497097  chr1    3204563 3207049 -

497097  chr1    3411783 3411982 +

497097  chr1    3660633 3661579 -

...

after two days of struggle to solve this I found that it was happening because 9th column didn't had "GeneID" word in the geneid section. it is very particular about the word and format

I converted the file into SAF format and hurray!! it ran perfectly. I converted to reflect only gene features, you can use according to your requirement.

grep 'gene' annot.gff |cut -d ';' -f1|tr -d ' ' |sed 's/ID=//g'|awk -v OFS='\t ''{print $9,$1,$4,$5,$7}' >annot.gff.SAF

featureCounts -T 20 -F SAF -a annot.gff.SAF -o FeatureCounts.out 1.bam 2.bam
ADD COMMENT

Login before adding your answer.

Traffic: 2057 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6