Entering edit mode
3.2 years ago
meisam.radfar
•
0
i want to use htseq-count for counting reads of my RNA-seq project
i use GFF3 file ( wheat ) from plantensemble website but when i write this command:
htseq-count -q -i gene_id -f bam X.bam Triticum_aestivum.IWGSC.49.gff3
i see this:
Feature transcript:ENSRNA050013875-T1 does not contain a ‘gene_id’ attribute
[exception type: ValueError, raised in features.py.329]
i use ID=gene but this is not work too.
in my gff3 file i can see:
ID=gene:ENSRNA050013875
gene_id=ENSRNA050013875
ID=transcript:ENSRNA050013875-T1
parent=gene:ENSRNA050013875
transcript_id=ENSRNA050013875-T1
parent=transcript:ENSRNA050013875-T1
Name:ENSRNA050013875-E1
exon_id=ENSRNA050013875-E1
just Name and exon_id work but i dont need them.
what if you use 'ID' as input for -i of htseq-count ?
alternatively you can consider using
FeatureCounts
in stead of htseq-count, that one is more lenient on those gene-id things (and also faster)Feature transcript:ENSRNA050013875-T1 does not contain a ‘ID’ attribute [exception type: ValueError, raised in features.py.329]
ok, then likely your GFF file is mal-formatted.
Try to run of the tools from the AGAT package on it to fix it?
AGAT - Another Gff Analysis Toolkit
thank you but i think i have a different problem
my gff3 file has gene_id=ENSRNA050013875 but this error said that transcript: ENSRNA050013875-T1 dose not contain gene_id
i think the problem is -T1
i can see gene_id=ENSRNA050013875 but i cant see gene_id=ENSRNA050013875-T1
my gff3 file:
my x.sam file:
Try to re-run your htseq-count command but use the
-t
parameter to indicate which type of feature (third column in GFF) you want to count. This usually is something like 'exon' , 'gene' , 'mRNA' ... in your case perhaps 'ncRNA_gene' . Check the htseq-count manual for details.yes this is true way to use both of -t and -i:
htseq-count -q -t gene -i ID -f bam x.sorted.bam x.gff3 > result.txt
because of difference GFF3 and GTF files
did you process this GFF file in some way?
from what you posted that does not look like a GFF file (the formatting that is) . Did you open it in windows/DOS or such?
no i open and see gff3 file by "gedit Triticum.ae...49.gff3" on ubuntu terminal
Can you post the exact full lines from your GFF3 file of the ENSRNA050013875 gene/transcript ?