Question

htseq-count reports count values for deleted genes

0

Entering edit mode

7 months ago

kmyers2 ▴ 80

I am using htseq-count on BAM files from a bacterial species. We are comparing WT strains as well as two strains with genes knocked out. The knockouts have been verified with whole genome sequencing, but do retain the first 15 and last 18 bp of the ORF. I used paired-end Illumina data to align. When using htseq-count, it records counts for these deleted genes. One gene has counts of ~95 (compared to ~6000 for WT) while the other gene has counts of ~28 (compared to ~160 for WT).

Viewing the alignments with BigWig files I see the majority of the gene is lacking all coverage, but the ends of the genes seem have reads that overlap slightly with the gene. I believe they are overlapping with the small amount of the ORF remaining and down/upstream sequence still present in the deletion strains. Previously, when using single-end data, htseq-count did not record any reads for the same deletion strains.

Because of the sequencing analysis and the RNA-seq alignment analysis, I know these genes are not present in the strains, and so the count results are misleading. We can manually adjust the count data to 0 but I am curious to know why we are seeing these count values for deleted genes.

Does anyone have a suggestion as to why we are seeing misleading gene counts with htseq-count for these two deleted genes using paired-end RNA-seq data? It seems misleading and any help on how to fix it would be appreciated.

Command used: htseq-count -f bam -s yes <bamfile> <gff>

HTSeq - version 2.0.3; Python - version 3.10.12; operating system - CentOS 7; PE reads aligned with BWA version 0.7.17-r1188

rna-seq htseq • 572 views

ADD COMMENT • link 7 months ago by kmyers2 ▴ 80

0

Entering edit mode

the ends of the genes seem have reads that overlap slightly with the gene. I believe they are overlapping with the small amount of the ORF remaining and down/upstream sequence still present in the deletion strains.

Isn't that why?

ADD REPLY • link 7 months ago by swbarnes2 14k

0

Entering edit mode

Yeah, I think that's the case. I was just wondering if this was something other people had seen or if there is a setting or something I need to do to address it.

ADD REPLY • link 7 months ago by kmyers2 ▴ 80

1

Entering edit mode

You have confirmed that the part you removed is not present in any reads?

Way ht-seq counts is shown in the illustration here: https://htseq.readthedocs.io/en/release_0.11.1/count.html

ADD REPLY • link 7 months ago by GenoMax 141k

0

Entering edit mode

Thanks for the soft-clipping suggestion. Yes, it shows that all but 10 bases at the 5' end and 10 bases at the 3' end of the ORF are missing in the alignment.

ADD REPLY • link 7 months ago by kmyers2 ▴ 80