create a track with mismatch density from a SAM/bam file
        3
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        I want to identify regions where a larger than average number of apparent varioants suggest an assembly error or a read alignment error.
Any idea how I could parse a SAM/BAM and count differences from the reference for each read which I then save in BED format?
I guess the solution in in the cigar but if someone has solved this before and could share some code, I would be very grateful.
Best
S
relates to Getting number of mismatches from bam record where no solution was posted so far
                    
                
                 
                
                
                    
                    
    
        
        
            sam
        
        
    
        
        
            bam
        
        
    
        
        
            mismatch
        
        
    
    
        • 2.7k views
    
                
                
                
                
             
            
            
         
     
 
     
    
        
            
                
    
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        Brilliant; I finally learned what NM was and it is exactly what I needed. Thanks Pierre
Hoping that the read is always right to the start position, I made of it the next command. Does this make sense?
samtools view -f 0x2 mappings.bam   | gawk -v rlen=${read_len} 'BEGIN{FS="\t"; OFS="\t"}{($16~/NM/)?split($16,nm,":"):split($17,nm,":"); print $3,$4,$4+rlen,"NM",nm[3]}' > NM_counts.bed
The NM seems to move around in my BAM between 16 and 17th columns, hence some edits
To be then merged / grouped-by and summarised
Best
Stephane
                    
                
                 
                
                
                
                
                
             
            
            
         
     
 
         
        
            
                
    
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        NM is not always in the $12 , the END position of a read on the REF is not START+read_len (indels, clipping......)
here is a solution using bioalcidaejdk: http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html
 java -jar dist/bioalcidaejdk.jar -e 'stream().filter(R->!R.getReadUnmappedFlag() && R.hasAttribute("NM")).forEach(R->println(R.getContig()+"\t"+(R.getStart()-1)+"\t"+R.getEnd()+"\t"+R.getIntegerAttribute("NM")));' in.bam
rotavirus   5   75  7
rotavirus   5   75  3
rotavirus   5   75  2
rotavirus   6   74  6
rotavirus   6   76  6
rotavirus   6   71  5
rotavirus   6   76  7
rotavirus   6   76  6
rotavirus   6   76  6
rotavirus   6   76  7
rotavirus   6   76  5
                    
                
                 
                
                
                
                
                
             
            
            
         
     
 
         
        
            
                
    
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        meanwhile rediscovered bedtools bamtobed which seems to do precisely what I want (-ed)
when I can compile the java above, I will compare.
S
                    
                
                 
                
                
                
                
                
             
            
            
         
     
 
         
        
    
    
        
            
                Login before adding your answer.
         
    
    
         
        
            
        
     
    
    Traffic: 3722 users visited in the last hour
         
    
    
        
    
    
 
what about Zev's solution: the NM tag ?