My RNAseq alignment QC is showing me that I have a fair number of "non primary hits", which, as I understand it, means the read has been mapped to multiple locations. (I created a very small subsample file of my .bam file in order to try and find where these reads are mapping.):
bam_stat.py -i file.bam Load BAM file ... Done #================================================== #All numbers are READ count #================================================== Total records: 116 QC failed: 0 Optical/PCR duplicate: 0 Non primary hits 54 Unmapped reads: 21 mapq < mapq_cut (non-unique): 22 mapq >= mapq_cut (unique): 19 Read-1: 9 Read-2: 10 Reads map to '+': 9 Reads map to '-': 10
This seems to also be resulting in a high fraction of "Unassigned_MultiMapping" fragments when I run featureCounts on the file.
Is there a way to investigate or identify what regions these "non primary hits" are mapping (or trying to) map to?
I've tried running
samtools view file.bam | grep -v NH:i:1 | perl -pe 's/AS.+(NH:i:\d+)/\$1/' | cut -f1,10,12 | perl -pe 's/NH:i://' | sort -u -k3,3nr > Multi-mapped.sorted.txt just to find out which reads are multi-mapped and how many times, but the file produced only shows one fragment (which is says is mapped once) and so I suspect there isn't an NH tag on the alignment file. Does the hisat2 aligner, which I used, tag multi-mapped genes, or should I try realigning with something like STAR?