How to deal with asterisk in bam file after alignment with STAR
1
0
Entering edit mode
10 weeks ago
brgs • 0

I have aligned pair-end RNA clip-seq data to human genome, where the output bam file contains some reads like:

AAAAAAAAAC:HWI-D00611:153:C6PBEANXX:5:1309:5483:77560 89 NC_000001.11 51781228 255 38M * 0 0 TTTCATGCGGGAAGGAAAGGATCAGTTGCCAAAAAGCC <<//BBF<BBFFFFBFFFBFFFFBFBBFF<FFFFFF<F NH:i:1 HI:i:1 AS:i:37 nM:i:0

I am wondering what the "*" means in the sequence? As most of the reads have "=" at that position, and when there is a "=" there are always two reads with the same head, but when there is a "*" there is only one read with that head.

(what I mean by head is this part: AAAAAAAAAC:HWI-D00611:153:C6PBEANXX:5:1309:5483:77560)

And I also want to know how to filter how reads with "*" by using samtools or other tools. Thanks a lot for helping me get out of there.

alignment samtools bam STAR • 335 views
2
Entering edit mode
10 weeks ago

I believe that the * indicates that the RNEXT (Reference sequence name of the primary alignment of the NEXT read in the template) is not available. Basically, it means that the other read in the pair is not mapped.

You can filter these alignments with:

0x8     8  MUNMAP         next segment in the template unmapped

0
Entering edit mode

Thanks! It works!

0
Entering edit mode

Sorry, again I would like to ask why this can happen? I searched online but haven't found a good explanation why one of the pair-end read can't be mapped to the genome.

0
Entering edit mode

a simple and biologically relevant explanation would be a contaminant, take any organism that shares some similarity with your reference, a fragment that originates in a similar region but ends in a dissimilar region will have a broken pair

you could also have some fusions in the sequence, the fused sequences produce fragments that don't quite exists in the reference

at the same time you could also have other weird things happening, more on the measurement or sequencing error side, one of the pairs being deteriorated