I have got some problem when analyzing my RNAseq data and I would like to seek for a help. Here is the brief description of my pipeline:
Obtained fasta file of 150 PE reads from Novaseq platform followed by non-stranded library prep I conducted fastQC and trimmomatic for all the files Since our sample is bacterial RNA, we don’t really need pair end for each sample, so I did only use R1 for all the downstream pipeline. I conduct Bowtie2 for mapping to my reference genome and I got >90 % reads mapped (my sample is mixed sample with some host tissue contamination) I plan to finish the reads count using HTseq:
htseq-count -f bam -s no -t gene -i ID <sample.bam> <annotation.gff> > output.txt
Here's what I got from output.txt:
no_feature 14756459 ambiguous 25200 __too_low_aQual 1902927 __not_aligned 3938933 __alignment_not_unique 0
most reads are “no_feature“ , I scan through each gene, the reads counts are all "0". I check my bam file using IGV, I can see the reads were in deeded mapped to the genome.
What could possibly go wrong? I've tried using default, -s <yes/no/reverse> but non of these improved. Why cannot my alignment file (bam and sam) recognize the corresponding features in gff file? (I've made sure the types matches (e.g. "ID" instead "gene_id").
below is the what my gff file looked like:
Well, the obvious...what is the name of the chromosome in your bam?
The other possible issue could be if FeatureCounts is looking for exons in your file, and nothing is marked as exonic
The htseq-count FAQ has a section about using GFF instead of GTF as a reference. In short, you need to adjust the