Entering edit mode
6 weeks ago
ibq.enriquepola • 0
I am trying to run HTSeq but it tells me that I have a problem in my GFF and GTF file, how can I fix this?
enrique@L:~prueba_guess$ htseq-count -f bam SRR214880.bam -s no -i ID -r pos -t exon GCF_902167145.1_Zm-B73-REFERENCE-NAM-5.0_genomic.gff > SRR214880.txt 100000 GFF lines processed. 200000 GFF lines processed. 300000 GFF lines processed. 400000 GFF lines processed. 500000 GFF lines processed. 600000 GFF lines processed. 700000 GFF lines processed. 800000 GFF lines processed. 900000 GFF lines processed. Error processing GFF file (line 997531 of file GCF_902167145.1_Zm-B73-REFERENCE-NAM-5.0_genomic.gff): Strand must be'+', '-', or '.'. [Exception type: ValueError, raised in _HTSeq.pyx:72]
My GFF file:
That error message is very descriptive. Could you also add the line 997531?
I guess you can try to replace the "?" to a "."
Also, you can copy and paste text instead of the screenshots?
NC_007982.1 RefSeq mRNA 50490 267232 . ? . ID=rna-ZeamMp186;Parent=gene-ZeamMp186;Dbxref=GeneID:4055939;gbkey=mRNA;gene=nad1;locus_tag=ZeamMp186
Thanks so much, how can I change that character?
It looks like “?” is a valid character for the strand column. There are only 5 "?" in that gff file, you can just change it using a text editor. If you fancy you can use something like this:
You should do so if strandness doesn't matter for your analysis. Otherwise, you should take care while doing this or find a more appropriate way to handle it. You can also create an issue in htseq GitHub repository or patch the feature in and create a pull request.