Dear biostars community,
Do you have any ideas how to "trick" htseq-count to treat a bed file of genomic coordinates as a gtf file of gene annotations so that i can get counts per genomic interval instead of counts per actual gene?
I have tried to edit the bed file to add upto 9 columns (like a standard gtf) but its possible that my feature label (in column 3) and gene_id label (in column9) are incorrect. i just added these labels arbitrarily. with this "fake" gtf file, i use a bam file with alignments for PE RNA-seq reads mapped with STAR.
this is what my "fake" gtf file looks like:
chr1 . exon 0 10000 . - 0 gene_id "1"; chr1 . exon 10000 20000 . - 0 gene_id "2"; chr1 . exon 20000 30000 . - 0 gene_id "3";
htseq count outputs this error and aborts:
Error occured when processing GFF file (line 1 of file *.gtf): start too small [Exception type: IndexError, raised in _HTSeq.pyx:376]
htseq-count crashes and output this error:
I assume that 0-based start is the problem here. how to edit these coordinates so that the intervals remain the same in terms of the region it spans? should it be 1 based? (i thought until now that the0-based start and 1-based end in bed format was compatible with htseq).
thank you so much for your help!