Question: Low HTseq Feature Counts in Subset of RNA-seq Samples
gravatar for kmyers2
4 weeks ago by
University of Wisconsin-Madison
kmyers220 wrote:

I am analyzing a set of 30 RNA-seq samples isolated from a non-model bacterial species. 24 of these samples are new experiments and 6 are re-analysis of a previously student's samples. All were isolated using the same growth conditions, isolation methods, etc. The 24 were sequenced at a different facility than the 6 re-analysis, but both libraries were constructed using strand-specific TruSeq kits (albeit at different times and in different facilities). My pipeline first trims the reads using Trimmomatic, aligns the reads to the genome using bowtie2, and uses htseq-count to determine feature hits. The Trimmomatic results and bowtie2 results look good and as expected: about 80% of reads survived trimming (average surviving read count of 21 million reads), about 75% of reads aligning exactly 1 time (average aligned exactly 1 time read count of 16 million). So the trimming and alignment steps seem to be behaving exactly as I expect.

However, when I use htseq-count I am getting very confusing results. The 6 re-analysis samples show results I expect with around 80% of aligned reads overlapping features from the GFF (average feature hit read count of 15 million reads). However, the 24 newly sequenced samples show extremely low feature hit counts (average of 25-30%, 6 million read average). I'm confused because I've analyzed samples from this bacterial species before with no issues and from both facilities with no issue. And all of these were analyzed at the same time, suggesting my pipeline is not at fault.

Here is the command I run for htseq-count:

htseq-count -t CDS -i Parent -s reverse input > output

I've also used samtools to pull out reads that aligned to the reverse strand only using the "-f 16" flag from multiple samples showing low or high feature hits in the htseq-count results. They all show about 50% of the reads in the .sam file have flag 16, suggesting that is not a reason for the low feature count. I'm just not sure why I would see such a low feature count in htseq-count given the good alignment results from bowtie2.

Does anyone have any ideas? I'm not sure what else to look at and before I contact the sequencing center and see if they can re-make the libraries and re-run the samples.


ADD COMMENTlink written 4 weeks ago by kmyers220
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1358 users visited in the last hour