I have a sorted, indexed bam of soft-clipped reads. I'm using samtools mpileup and the pileup file it's generating is malformed. For example I see a lot of regions like this, where the # of aligned reads at a position decrements without a corresponding read termination character ($):
1 3008463 T 7 .,,.... AGB8A8>
1 3008464 G 6 .,,.$..$ 3GB8<8
Oops! I also see plenty of reads that do not have start tokens. Now, granted I'm stress testing it and feeding it really nasty, messy soft-clipped data, so I'm not surprised about the garbage in/garbage out behavior, but if mpileup does not "behave" with soft clipped data I'd like to document this more prominently somewhere, perhaps in the docs for mpileup.
Alright, anybody with any thoughts on this?
-isaac
Nailed it. Thank you sir.