htseq-count reports count values for deleted genes
0
0
Entering edit mode
7 months ago
kmyers2 ▴ 80

I am using htseq-count on BAM files from a bacterial species. We are comparing WT strains as well as two strains with genes knocked out. The knockouts have been verified with whole genome sequencing, but do retain the first 15 and last 18 bp of the ORF. I used paired-end Illumina data to align. When using htseq-count, it records counts for these deleted genes. One gene has counts of ~95 (compared to ~6000 for WT) while the other gene has counts of ~28 (compared to ~160 for WT).

Viewing the alignments with BigWig files I see the majority of the gene is lacking all coverage, but the ends of the genes seem have reads that overlap slightly with the gene. I believe they are overlapping with the small amount of the ORF remaining and down/upstream sequence still present in the deletion strains. Previously, when using single-end data, htseq-count did not record any reads for the same deletion strains.

Because of the sequencing analysis and the RNA-seq alignment analysis, I know these genes are not present in the strains, and so the count results are misleading. We can manually adjust the count data to 0 but I am curious to know why we are seeing these count values for deleted genes.

Does anyone have a suggestion as to why we are seeing misleading gene counts with htseq-count for these two deleted genes using paired-end RNA-seq data? It seems misleading and any help on how to fix it would be appreciated.

Command used: htseq-count -f bam -s yes <bamfile> <gff>

HTSeq - version 2.0.3; Python - version 3.10.12; operating system - CentOS 7; PE reads aligned with BWA version 0.7.17-r1188

rna-seq htseq • 570 views
ADD COMMENT
0
Entering edit mode

the ends of the genes seem have reads that overlap slightly with the gene. I believe they are overlapping with the small amount of the ORF remaining and down/upstream sequence still present in the deletion strains.

Isn't that why?

ADD REPLY
0
Entering edit mode

Yeah, I think that's the case. I was just wondering if this was something other people had seen or if there is a setting or something I need to do to address it.

ADD REPLY
1
Entering edit mode

You have confirmed that the part you removed is not present in any reads?

Way ht-seq counts is shown in the illustration here: https://htseq.readthedocs.io/en/release_0.11.1/count.html

ADD REPLY
0
Entering edit mode

Thanks for the soft-clipping suggestion. Yes, it shows that all but 10 bases at the 5' end and 10 bases at the 3' end of the ORF are missing in the alignment.

ADD REPLY

Login before adding your answer.

Traffic: 1535 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6