Hi All,
I am doing a differential gene expression analysis with samples form a non-model plant species, Nicotiana benthamiana.
I am now at the step of counting, trying out htseq-count on Hisat2-generated bam files. Recommended options are --type=exon and --idattr=gene_id, i.e. count exons and summarize on genes.
However, my GFF file does not contain gene_id as an attribute.
My question: Is it OK to use Parent as --idattr in place of the (non-existent) gene_id as follows?
htseq-count -f bam -s yes -t exon -i Parent example.bam Niben101_annotation.allfeatures.gff > example_htseq_counts_file.txt
Few lines from the GFF file:
Niben101Ctg00116 maker gene 2 501 . - . ID=Niben101Ctg00116g00002;Name=Niben101Ctg00116g00002;PredictionNote=maker-snap;AltID=maker-Niben101Ctg00116-snap-gene-0.1
Niben101Ctg00116 maker mRNA 2 501 . - . ID=Niben101Ctg00116g00002.1;Parent=Niben101Ctg00116g00002;Name=Niben101Ctg00116g00002.1;_AED=0.00;_eAED=0.00;_QI=0|1|0.33|1|0|0|3|0|110;AltID=maker-Niben101Ctg00116-snap-gene-0.1-mRNA-1
Niben101Ctg00116 maker exon 2 25 . - . ID=Niben101Ctg00116g00002.1:exon:003;Parent=Niben101Ctg00116g00002.1;AltID=maker-Niben101Ctg00116-snap-gene-0.1-mRNA-1:exon:4517
Niben101Ctg00116 maker exon 120 314 . - . ID=Niben101Ctg00116g00002.1:exon:002;Parent=Niben101Ctg00116g00002.1;AltID=maker-Niben101Ctg00116-snap-gene-0.1-mRNA-1:exon:4516
Niben101Ctg00116 maker exon 391 501 . - . ID=Niben101Ctg00116g00002.1:exon:001;Parent=Niben101Ctg00116g00002.1;AltID=maker-Niben101Ctg00116-snap-gene-0.1-mRNA-1:exon:4515
Thank you all in advance,
Thanks for your reply! greatly appreciated!
A bit confused though: I want to summarize counts at the gene level, not at the exon level. In other words, I don't want to see the counts for each exon separately but rather the counts for each gene. I hope that makes sense.
Apart from that: the GFF file is indeed different/irregular, however htseq-count seems to be parsing it properly and counts are being generated as expected.