htseq-count error Feature does not contain a 'gene_id' attribute
1
1
Entering edit mode
6.4 years ago

I'm trying to do read count using htseq-count

htseq-count --mode=union --stranded=no --idattr=gene_id -f sam ./filteredfirstZBND1X_gff.sam ./filteredsecZBND1X_gff.sam /home/madzays/data/finc h_data/ZF_transcriptome/taeGut2_gff/taeGut2.gff > ZBND1X_gff_counts.sam

but get the error

Error occured when processing GFF file (line 11 of file  /home/madzays/data/finch_data/ZF_transcriptome/taeGut2_gff/taeGut2.gff):
Feature id1 does not contain a 'gene_id' attribute   [Exception type:
ValueError, raised in count.py:77]

this is what my gff file (from NCBI) looks like

##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
#!genome-build Taeniopygia_guttata-3.2.4
#!genome-build-accession NCBI_Assembly:GCF_000151805.1
##sequence-region NC_011462.1 1 118548696
##species http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=59729
NC_011462.1     RefSeq  region  1       118548696       .       +       .       ID=id0;Dbxref=taxon:59729;Name=1;chromosome=1;gbkey=Src;genome=chromosome;isolate=Black17;mol_type=genomic DNA;sex=male
NC_011462.1     Gnomon  gene    17853   28371   .       +       .       ID=gene0;Dbxref=GeneID:100227449;Name=CLIC6;gbkey=Gene;gene=CLIC6;gene_biotype=protein_coding;partial=true;start_range=.,17853
NC_011462.1     Gnomon  mRNA    17853   28371   .       +       .       ID=rna0;Parent=gene0;Dbxref=GeneID:100227449,Genbank:XM_002186596.2;Name=XM_002186596.2;gbkey=mRNA;gene=CLIC6;partial=true;product=chloride intracellular channel 6;start_range=.,17853;transcript_id=XM_002186596.2
RNA-Seq htseq gff read count • 10k views
ADD COMMENT
0
Entering edit mode
  1. Try to convert gff to gtf and feed it to htseq-count. try searching for gffread util.
  2. Shift to featurecounts.
ADD REPLY
2
Entering edit mode
6.4 years ago

Yes. There is no gene_id attribute in your gff. The error is pretty self-explanatory in that sense.

Try with --idattr=gene. Based on the few lines you posted that attribute is present.

ADD COMMENT
0
Entering edit mode

K it's doing its thing now. thanks!

ADD REPLY
0
Entering edit mode

Oh wait it run into a line without that attribute. Do you know if NCBI gffs have any "must have" attribute I could use? thanks

ADD REPLY
1
Entering edit mode

Nope. You can print out the line which caused the problem using e.g.
sed -n 26p taeGut2.gff for line 26, then try to figure what that line does have in common with the rest.

ADD REPLY
0
Entering edit mode

Yep it runs for "gbkey". However it outputs

-bash-4.2$ head -50 ZBND1X_gff_counts_reverse.sam
C_region        0       0
V_segment       0       0
exon    0       0
mRNA    0       0
misc_RNA        0       0
ncRNA   0       0
precursor_RNA   0       0
rRNA    0       0
tRNA    0       0
__no_feature    21536704        15508244
__ambiguous     0       0
__too_low_aQual 0       0
__not_aligned   6233570 4564356
__alignment_not_unique  2340806 1668760

I think it has to do with the gtf file having these RefSeq ids instead of chromosome locations (e.g. chr1) Any ideas how could I sub them?

ADD REPLY
0
Entering edit mode

Perhaps ensembl has a gff for your organism?

ADD REPLY

Login before adding your answer.

Traffic: 2362 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6