HTSeq error processing GFF file
0
0
Entering edit mode
6 weeks ago

Hello,

I am trying to run HTSeq but it tells me that I have a problem in my GFF and GTF file, how can I fix this?

enrique@L:~prueba_guess$htseq-count -f bam SRR214880.bam -s no -i ID -r pos -t exon GCF_902167145.1_Zm-B73-REFERENCE-NAM-5.0_genomic.gff > SRR214880.txt 100000 GFF lines processed. 200000 GFF lines processed. 300000 GFF lines processed. 400000 GFF lines processed. 500000 GFF lines processed. 600000 GFF lines processed. 700000 GFF lines processed. 800000 GFF lines processed. 900000 GFF lines processed. Error processing GFF file (line 997531 of file GCF_902167145.1_Zm-B73-REFERENCE-NAM-5.0_genomic.gff): Strand must be'+', '-', or '.'. [Exception type: ValueError, raised in _HTSeq.pyx:72]  My GFF file: HTSeq GFF GTF • 722 views ADD COMMENT 0 Entering edit mode That error message is very descriptive. Could you also add the line 997531? ADD REPLY 0 Entering edit mode Yes, ADD REPLY 0 Entering edit mode I guess you can try to replace the "?" to a "." Also, you can copy and paste text instead of the screenshots? ADD REPLY 0 Entering edit mode enrique@L:~/bioinformatica/index/prueba$ head -n 997531 GCF_902167145.1_Zm-B73-REFERENCE-NAM-5.0_genomic.gff|tail -n 1


NC_007982.1 RefSeq mRNA 50490 267232 . ? . ID=rna-ZeamMp186;Parent=gene-ZeamMp186;Dbxref=GeneID:4055939;gbkey=mRNA;gene=nad1;locus_tag=ZeamMp186

Thanks so much, how can I change that character?

0
Entering edit mode

It looks like “?” is a valid character for the strand column. There are only 5 "?" in that gff file, you can just change it using a text editor. If you fancy you can use something like this:

awk -F"\t" '{if ($7=="?") {$7="."; print $0} else {print$0}}' my.gff > mynew.gff


You should do so if strandness doesn't matter for your analysis. Otherwise, you should take care while doing this or find a more appropriate way to handle it. You can also create an issue in htseq GitHub repository or patch the feature in and create a pull request.