I am trying to annotate my RNA-seq files (paired-end) with featureCounts. However, I keep having a quite low annotation rate, ~35-36% for all the files.
My command line is the following:
featureCounts -T 6 -p -s 2 -a annotation_file.gtf -o output_file.txt input_files.bam
I am using the parameter
-s 2 since my paired-end files seem to be reversely stranded (I have assessed that by running
infer_experiment.py from the RSeQC package.)
Alignment has been performed with
hisat2 and .bam files sorted by coordinates before performing annotation.
I have checked rRNA contamination in .bam files using
split_bam.py from RSeQC package and the .bed file for hg38 rRNAs provided by the package. None of my files show rRNA contamination.
In order to try solving the problem, I first checked that the chromosome names were consistent between my .bam files and the provided .gtf file. They are the same.
Then, I tried using .gtf files from different sources (Gencode, Ensembl, UCSC,..) but the results did not change.
Lastly, I tried running
featureCounts using also the
-O parameter, which includes multi-overlapping reads, just to check if most of the reads were not annotated due to multi-overlapping reasons. The results did not change even in this case.
I would really appreciate any suggestion for trying to figure out what causes poor annotation of my files. I have checked as much as I could, accordingly to my experience with RNA-seq data, but I could not find a solution.
Thanks in advance to everybody who will try helping me! :)