Hi
We have performed RNA-seq of 60 sorted T cells from human and aligned to the reference genome (GRCh38) using HISAT2. About 80% of reads were mapped to the reference genome. However, when featureCounts of Subread package was used to count mapped reads, only 20-25% of mapped reads were counted to be associated with feature provided (exon and gene_id) of the reference gtf file (Homo_sapiens.GRCh38.87.gtf). I do not know what is happening here. I have two questions:
- First what can be possible reason for this observation?
- Is it possible that we can identify to which region of the reference genome the rest of the reads mapped?
I will appreciate your help.
Another recommendation, check if you used the proper strand selection in feature counts. Try what happens with different strand selectors -s0 -s1 and -s2 in featureCount.
When I changed the strand selector from -s2 to -s0 in featureCount, the assigned read count were doubled i.e. from 10-12% to 20-25%. It is still very low exonic reads.
That sounds suspiciously low.