Hello! :)
I have aligned my reads from SmallRNA sequencing with Botwie2 against genome (human, hg38) and with the options --end-to-end --very-sensitive
and my reads were almost aligned (about 90%).
Then I used featuresCounts
in order to obtain read summarization with the gff3 from miRBase (I have converted it into GTF with gffread):
featureCounts -T 5 -t exon -g transcript_id -a hsa.gtf -o counts.txt file.sam
This is the summary file in output (1):
Assigned 37200
Unassigned_Unmapped 2398401
Unassigned_MappingQuality 0
Unassigned_Chimera 0
Unassigned_FragmentLength 0
Unassigned_Duplicate 0
Unassigned_MultiMapping 0
Unassigned_Secondary 0
Unassigned_Nonjunction 0
Unassigned_NoFeatures 43583950
Unassigned_Overlapping_Length 0
Unassigned_Ambiguity 38946846
The number of assigned is very low.... I tried to use featuresCounts with the complete gtf from Gencode, and I obtained this result (2):
Assigned 33503995
Unassigned_Unmapped 2398401
Unassigned_MappingQuality 0
Unassigned_Chimera 0
Unassigned_FragmentLength 0
Unassigned_Duplicate 0
Unassigned_MultiMapping 0
Unassigned_Secondary 0
Unassigned_Nonjunction 0
Unassigned_NoFeatures 26792865
Unassigned_Overlapping_Length 0
Unassigned_Ambiguity 22271136
Lastly, I have taken the gtf from Gencode and I have eliminated the rows that contained "miRNA". The result (3):
Assigned 31544916
Unassigned_Unmapped 2398401
Unassigned_MappingQuality 0
Unassigned_Chimera 0
Unassigned_FragmentLength 0
Unassigned_Duplicate 0
Unassigned_MultiMapping 0
Unassigned_Secondary 0
Unassigned_Nonjunction 0
Unassigned_NoFeatures 47393496
Unassigned_Overlapping_Length 0
Unassigned_Ambiguity 3629584
As you can see the amount of "Unassigned_Ambiguity" in the case 2 was higher than case 3, and the amount of assigned in case 3 was higher than case 1.
Could you help me to understand what is happening? Are there specific parameters in featuresCount that I have to use with smallRNA data? I was expecting to obtained an higher amount of "Assigned" for miRNA...
Thank you in advance!