create a track with mismatch density from a SAM/bam file
3
0
Entering edit mode
3.6 years ago

I want to identify regions where a larger than average number of apparent varioants suggest an assembly error or a read alignment error.

Any idea how I could parse a SAM/BAM and count differences from the reference for each read which I then save in BED format?

I guess the solution in in the cigar but if someone has solved this before and could share some code, I would be very grateful.

Best S

relates to Getting number of mismatches from bam record where no solution was posted so far

sam bam mismatch • 1.0k views
0
Entering edit mode

how I could parse a SAM/BAM and count differences .... relates to "Getting number of mismatches from bam record " where no solution was posted so far

what about Zev's solution: the NM tag ?

0
Entering edit mode
3.6 years ago

Brilliant; I finally learned what NM was and it is exactly what I needed. Thanks Pierre

Hoping that the read is always right to the start position, I made of it the next command. Does this make sense?

samtools view -f 0x2 mappings.bam   | gawk -v rlen=${read_len} 'BEGIN{FS="\t"; OFS="\t"}{($16~/NM/)?split($16,nm,":"):split($17,nm,":"); print $3,$4,$4+rlen,"NM",nm[3]}' > NM_counts.bed  The NM seems to move around in my BAM between 16 and 17th columns, hence some edits To be then merged / grouped-by and summarised Best Stephane ADD COMMENT 0 Entering edit mode 3.6 years ago NM is not always in the$12 , the END position of a read on the REF is not START+read_len (indels, clipping......)

here is a solution using bioalcidaejdk: http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

 java -jar dist/bioalcidaejdk.jar -e 'stream().filter(R->!R.getReadUnmappedFlag() && R.hasAttribute("NM")).forEach(R->println(R.getContig()+"\t"+(R.getStart()-1)+"\t"+R.getEnd()+"\t"+R.getIntegerAttribute("NM")));' in.bam

rotavirus   5   75  7
rotavirus   5   75  3
rotavirus   5   75  2
rotavirus   6   74  6
rotavirus   6   76  6
rotavirus   6   71  5
rotavirus   6   76  7
rotavirus   6   76  6
rotavirus   6   76  6
rotavirus   6   76  7
rotavirus   6   76  5

0
Entering edit mode

when I can compile this I will try (THANKS)

0
Entering edit mode
3.6 years ago

meanwhile rediscovered bedtools bamtobed which seems to do precisely what I want (-ed) when I can compile the java above, I will compare. S