I am working on a set of RNAseq samples. I am using featureCounts from the subread package in order to count fragments (not reads) falling into genomic features. I am using the command line option
--ignoreDup to exclude duplicate reads. The results seem somewhat strange...
Here is the summary file provided by featureCounts for two samples:
Status sample1.bam Assigned 12019290 Unassigned_Ambiguity 0 Unassigned_MultiMapping 7794471 Unassigned_NoFeatures 16908358 Unassigned_Unmapped 0 Unassigned_MappingQuality 0 Unassigned_FragementLength 0 Unassigned_Chimera 0 Unassigned_Secondary 0 Unassigned_Nonjunction 0 Unassigned_Duplicate 68111548 Status sample2.bam Assigned 48247506 Unassigned_Ambiguity 0 Unassigned_MultiMapping 15519394 Unassigned_NoFeatures 67192231 Unassigned_Unmapped 0 Unassigned_MappingQuality 0 Unassigned_FragementLength 0 Unassigned_Chimera 0 Unassigned_Secondary 0 Unassigned_Nonjunction 0 Unassigned_Duplicate 0
All samples were processed in the same way and same commands were used to run featureCounts. Why does one sample have
0 unassigned duplicates and the other
68111548? This difference seems so black and white that I am afraid there is an error somewhere. What exactly does it mean to have unassigned duplicates?