I am new to RNA-seq analysis of paired end read data and I was performing read counting using featureCounts. The use of -O flag increases the number of assigned reads drastically (difference of more than 20 million assigned reads). I don't quite understand the use of this option. In the manual it says that it "Assign reads to all their overlapping meta-features (or features if -f is specified)". Please can someone elaborate on this. Thanks in advance
In the featureCounts manual, it said
By default, featureCounts does not count multi-overlapping reads
But after specifying -O flag
each overlapping meta-feature/feature receives a count of 1 from a read
So first, multi-overlapping reads will be counted; Second, more features overlapped by one read, more times it will be counted. I think that's why the number of assigned reads drastically
Below is the results summary:
Without -O flag
Assigned 4515031 Unassigned_Unmapped 0 Unassigned_Read_Type 0 Unassigned_Singleton 0 Unassigned_MappingQuality 0 Unassigned_Chimera 0 Unassigned_FragmentLength 0 Unassigned_Duplicate 0 Unassigned_MultiMapping 4898882 Unassigned_Secondary 0 Unassigned_NonSplit 0 Unassigned_NoFeatures 1220228 Unassigned_Overlapping_Length 0 Unassigned_Ambiguity 24695226
With -O flag
Assigned 29210257 Unassigned_Unmapped 0 Unassigned_Read_Type 0 Unassigned_Singleton 0 Unassigned_MappingQuality 0 Unassigned_Chimera 0 Unassigned_FragmentLength 0 Unassigned_Duplicate 0 Unassigned_MultiMapping 4898882 Unassigned_Secondary 0 Unassigned_NonSplit 0 Unassigned_NoFeatures 1220228 Unassigned_Overlapping_Length 0 Unassigned_Ambiguity 0
No. of ambiguous reads is more than 20 million. The code that I have used is:
featureCounts -p -f -O -a /home/erpl/RNA-seq_Alignment_tools/star/indexing/Homo_sapiens.GRCh38.98.gtf -o f_counts_T24_1.txt /home/erpl/RNA_seq_Novogene/RNA_Sequencing_Novogene_Results/output_17.10.19/T24_1Aligned.sortedByCoord.out.bam
Alignment was performed with STAR.
STAR --runThreadN 12 --runMode genomeGenerate --sjdbGTFfile Homo_sapiens.GRCh38.98.gtf --genomeDir /home/erpl/star/indexing --genomeFastaFiles Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa STAR --runThreadN 20 --genomeDir /home/erpl/RNA-seq_Alignment_tools/star/indexing --sjdbGTFfile /home/erpl/RNA-seq_Alignment_tools/star/indexing/Homo_sapiens.GRCh38.98.gtf --readFilesIn C24_1_1.fq C24_1_2.fq --outSAMtype BAM SortedByCoordinate --outFileNamePrefix /home/erpl/RNA_seq_Novogene/RNA_Sequencing_Novogene_Results/output_17.10.19/C24_1
Is there a problem with the way I am performing this analysis or is there a problem with the library?