I have aligned pair-end RNA clip-seq data to human genome, where the output bam file contains some reads like:
AAAAAAAAAC:HWI-D00611:153:C6PBEANXX:5:1309:5483:77560 89 NC_000001.11 51781228 255 38M * 0 0 TTTCATGCGGGAAGGAAAGGATCAGTTGCCAAAAAGCC <<//BBF<BBFFFFBFFFBFFFFBFBBFF<FFFFFF<F NH:i:1 HI:i:1 AS:i:37 nM:i:0
I am wondering what the "*" means in the sequence? As most of the reads have "=" at that position, and when there is a "=" there are always two reads with the same head, but when there is a "*" there is only one read with that head.
(what I mean by head is this part: AAAAAAAAAC:HWI-D00611:153:C6PBEANXX:5:1309:5483:77560)
And I also want to know how to filter how reads with "*" by using samtools or other tools. Thanks a lot for helping me get out of there.