Question: ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
0
gravatar for chatterjee.arumoy
6 months ago by
chatterjee.arumoy0 wrote:

Hi,

I am trying to use featureCounts to analyse my RNA-seq data with Apis mellifera. My Code and error are as follows. r

/softwares/subread-2.0.0-source/bin/featureCounts 
-T 16 
-p 
-s 1 
-a /home/axel/arumoyc/alignment/GCF_003254395.2_Amel_HAv3.1_genomic.gtf 
-t exon
 -g gene_id 
-o /home/axel/arumoyc/counts_all/all/count24.txt /home/axel/arumoyc/bamfiles_test/bamfiles/bamfile24/map24Aligned.sorted.out.bam
 2> /home/axel/arumoyc/counts_all/all/count24.screen-output.log

ERROR:

failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is 'geneid' An example of attributes included in your GTF annotation is 'geneid ""; transcriptid "unknowntranscript1"; anticodon "(pos:31..33)"; gbkey "tRNA"; product "tRNA-Glu"; exonnumber "1"; 'The program has to terminate.

The .gtf file was downloaded from NCBI and was not manipulated. Please help me on this error. Thanks in advance.

featurecounts • 768 views
ADD COMMENTlink modified 5 weeks ago by genomax91k • written 6 months ago by chatterjee.arumoy0

Have you looked at the contents of your provided GTF file? With the -g option on featureCounts you're telling it to look for the identifier you proided. Most likely your GTF file is either missing the identifier you provided or is using a different name.

ADD REPLYlink written 6 months ago by brianj.park50

I see the "gene_id" is present. I also converted the .gtf file into an excel table also and the 9th column is indeed "gene_id". Each column of the .gtf file has 335791 entries.

#gtf-version 2.2
#!genome-build Amel_HAv3.1
#!genome-build-accession NCBI_Assembly:GCF_003254395.2
#!annotation-source NCBI Apis mellifera Annotation Release 104
NC_037638.1     Gnomon  gene    9273    12174   .       -       .       gene_id "LOC551580"; db_xref "BEEBASE:GB42195"; db_xref "GeneID:551580"; gbkey "Gene"; gene "LOC551580"; gene_biotype "protein_coding";
NC_037638.1     Gnomon  exon    11812   12174   .       -       .       gene_id "LOC551580"; transcript_id "XR_001705491.2"; db_xref "GeneID:551580"; gbkey "misc_RNA"; gene "LOC551580"; model_evidence "Supporting evidence includes similarity to: 100% coverage of the annotated genomic feature by RNAseq alignments, including 57 samples with support for all annotated introns"; product "ubiquitin-related modifier 1, transcript variant X3"; exon_number "1";
NC_037638.1     Gnomon  exon    11054   11121   .       -       .       gene_id "LOC551580"; transcript_id "XR_001705491.2"; db_xref "GeneID:551580"; gbkey "misc_RNA"; gene "LOC551580"; model_evidence "Supporting evidence includes similarity to: 100% coverage of the annotated genomic feature by RNAseq alignments, including 57 samples with support for all annotated introns"; product "ubiquitin-related modifier 1, transcript variant X3"; exon_number "2";
NC_037638.1     Gnomon  exon    10913   10994   .       -       .       gene_id "LOC551580"; transcript_id "XR_001705491.2"; db_xref "GeneID:551580"; gbkey "misc_RNA"; gene "LOC551580"; model_evidence "Supporting evidence includes similarity to: 100% coverage of the annotated genomic feature by RNAseq alignments, including 57 samples with support for all annotated introns"; product "ubiquitin-related modifier 1, transcript variant X3"; exon_number "3";
NC_037638.1     Gnomon  exon    9779    9827    .       -       .       gene_id "LOC551580"; transcript_id "XR_001705491.2"; db_xref "GeneID:551580"; gbkey "misc_RNA"; gene "LOC551580"; model_evidence "Supporting evidence includes similarity to: 100% coverage of the annotated genomic feature by RNAseq alignments, including 57 samples with support for all annotated introns"; product "ubiquitin-related modifier 1, transcript variant X3"; exon_number "4";
NC_037638.1     Gnomon  exon    9274    9546    .       -       .       gene_id "LOC551580"; transcript_id "XR_001705491.2"; db_xref "GeneID:551580"; gbkey "misc_RNA"; gene "LOC551580"; model_evidence "Supporting evidence includes similarity to: 100% coverage of the annotated genomic feature by RNAseq alignments, including 57 samples with support for all annotated introns"; product "ubiquitin-related modifier 1, transcript variant X3"; exon_number "5";
NC_037638.1     Gnomon  exon    11579   12174   .       -       .       gene_id "LOC551580"; transcript_id "XR_001705490.2"; db_xref "GeneID:551580"; gbkey "misc_RNA"; gene "LOC551580"; model_evidence "Supporting evidence includes similarity to: 100% coverage of the annotated genomic feature by RNAseq alignments, including 65 samples with support for all annotated introns"; product "ubiquitin-related modifier 1, transcript variant X2"; exon_number "1";
NC_037638.1     Gnomon  exon    11054   11121   .       -       .       gene_id "LOC551580"; transcript_id "XR_001705490.2"; db_xref "GeneID:551580"; gbkey "misc_RNA"; gene "LOC551580"; model_evidence "Supporting evidence includes similarity to: 100% coverage of the annotated genomic feature by RNAseq alignments, including 65 samples with support for all annotated introns"; product "ubiquitin-related modifier 1, transcript variant X2"; exon_number "2";
NC_037638.1     Gnomon  exon    10913   10994   .       -       .       gene_id "LOC551580"; transcript_id "XR_001705490.2"; db_xref "GeneID:551580"; gbkey "misc_RNA"; gene "LOC551580"; model_evidence "Supporting evidence includes similarity to: 100% coverage of the annotated genomic feature by RNAseq alignments, including 65 samples with support for all annotated introns"; product "ubiquitin-related modifier 1, transcript variant X2"; exon_number "3";
ADD REPLYlink modified 5 weeks ago by genomax91k • written 6 months ago by chatterjee.arumoy0

I seem to be having the same issue with a .gtf file downloaded from NCBI: GCF_003339765.1_Mmul_10_genomic.gtf

$ head GCF_003339765.1_Mmul_10_genomic.gtf 

#gtf-version 2.2
#!genome-build Mmul_10
#!genome-build-accession NCBI_Assembly:GCF_003339765.1
#!annotation-source NCBI Macaca mulatta Annotation Release 103
NC_041754.1     Gnomon  gene    8796    27366   .       -       .       gene_id "PGBD2"; db_xref "GeneID:114678393"; gbkey "Gene"; gene "PGBD2"; gene_biotype "protein_coding";
NC_041754.1     Gnomon  exon    26570   27366   .       -       .       gene_id "PGBD2"; transcript_id "XM_028848769.1"; db_xref "GeneID:114678393"; gbkey "mRNA"; gene "PGBD2"; model_evidence "Supporting evidence includes similarity to: 36 ESTs, 2 Proteins, 5 long SRA reads, and 89% coverage of the annotated genomic feature by RNAseq alignments, including 69 samples with support for all annotated introns"; product "piggyBac transposable element derived 2"; exon_number "1";
NC_041754.1     Gnomon  exon    13491   13554   .       -       .       gene_id "PGBD2"; transcript_id "XM_028848769.1"; db_xref "GeneID:114678393"; gbkey "mRNA"; gene "PGBD2"; model_evidence "Supporting evidence includes similarity to: 36 ESTs, 2 Proteins, 5 long SRA reads, and 89% coverage of the annotated genomic feature by RNAseq alignments, including 69 samples with support for all annotated introns"; product "piggyBac transposable element derived 2"; exon_number "2";
NC_041754.1     Gnomon  exon    8796    10763   .       -       .       gene_id "PGBD2"; transcript_id "XM_028848769.1"; db_xref "GeneID:114678393"; gbkey "mRNA"; gene "PGBD2"; model_evidence "Supporting evidence includes similarity to: 36 ESTs, 2 Proteins, 5 long SRA reads, and 89% coverage of the annotated genomic feature by RNAseq alignments, including 69 samples with support for all annotated introns"; product "piggyBac transposable element derived 2"; exon_number "3";
NC_041754.1     Gnomon  CDS     13491   13507   .       -       0       gene_id "PGBD2"; transcript_id "XM_028848769.1"; db_xref "GeneID:114678393"; gbkey "CDS"; gene "PGBD2"; product "piggyBac transposable element-derived protein 2"; protein_id "XP_028704602.1"; exon_number "2";
NC_041754.1     Gnomon  CDS     9005    10763   .       -       1       gene_id "PGBD2"; transcript_id "XM_028848769.1"; db_xref "GeneID:114678393"; gbkey "CDS"; gene "PGBD2"; product "piggyBac transposable element-derived protein 2"; protein_id "XP_028704602.1"; exon_number "3";

I am aware from this guidance (https://mblab.wustl.edu/GTF2.html) and this biostars question (GFF3 to GTF conversion - 9th column) that gene_id and transcript_id must be at the start of the 9th column and I tried to correct my .gtf file based on recommendations in the biostar post, however, this hasn't solved the issue.

I have another annotation downloaded from Ensembl and it runs fine however it's not the annotation I want to use.

Further clarification on this would be greatly appreciated and thank you for you time reading this

ADD REPLYlink modified 5 weeks ago by genomax91k • written 5 months ago by hannepainter0

I was having the same error message. I tried to manipulate my gtf file in many ways. I read someone suggesting to go to an older version of subread and that worked for me. We happened to have subread/1.5.1. Good luck.

ADD REPLYlink written 4 months ago by amaladurai0

You can try AGAT it might fix your problem

ADD REPLYlink written 4 months ago by Juke344.8k
1
gravatar for Chris S.
5 weeks ago by
Chris S.300
United States
Chris S.300 wrote:

featureCounts does not allow empty values in the gene_id field, so you need to remove or update them. See this answer from the developer https://groups.google.com/g/subread/c/xs7mw38Bc6g.

grep 'gene_id ""' GCF_003254395.2_Amel_HAv3.1_genomic.gtf 
grep -v 'gene_id ""' GCF_003254395.2_Amel_HAv3.1_genomic.gtf > Amel_HAv3.1_FIXED.gtf
ADD COMMENTlink written 5 weeks ago by Chris S.300

Thanks, Chris S. It seemed to work for me. Since I am not well versed with grep, please kindly tell me (us) how did you fix the GTF file? And where from could we learn more about handling this type of issues. Thanks a lot.

ADD REPLYlink written 4 weeks ago by chatterjee.arumoy0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1278 users visited in the last hour