Hi everyone,

I am doing the differential expression of two groups. I have used ncbi refseq gtf and ran hisat2 and string tie without e option. When I looked at the stringtie output gtf individual and after merging them, it contains some transcript id as " unknown_transcript_1" for a some transcripts. I looked at initial ncbi gtf and some data have gene_id blank and have transcript_id as "unknown_transcript_1" for some transcripts. They are mostly from mitochondrial and scaffold part of genome. so when I ran the htseqcount I got first row as empty gene_id with read numbers. Should I exclude that row from htseqcount and do differential expression ? I have done same job with ensemble gtf and there was no issue like that.

I will really appreciate your help if you guys can suggest and give your recommendation for it .

Thanks in advance.


@patelbhaumikn please contact RefSeq to report this issue with GTF files.

