create a track with mismatch density from a SAM/bam file
3
0
Entering edit mode
3.6 years ago

I want to identify regions where a larger than average number of apparent varioants suggest an assembly error or a read alignment error.

Any idea how I could parse a SAM/BAM and count differences from the reference for each read which I then save in BED format?

I guess the solution in in the cigar but if someone has solved this before and could share some code, I would be very grateful.

Best S

relates to Getting number of mismatches from bam record where no solution was posted so far

sam bam mismatch • 1.0k views
ADD COMMENT
0
Entering edit mode

how I could parse a SAM/BAM and count differences .... relates to "Getting number of mismatches from bam record " where no solution was posted so far

what about Zev's solution: the NM tag ?

ADD REPLY
0
Entering edit mode
3.6 years ago

Brilliant; I finally learned what NM was and it is exactly what I needed. Thanks Pierre

Hoping that the read is always right to the start position, I made of it the next command. Does this make sense?

samtools view -f 0x2 mappings.bam   | gawk -v rlen=${read_len} 'BEGIN{FS="\t"; OFS="\t"}{($16~/NM/)?split($16,nm,":"):split($17,nm,":"); print $3,$4,$4+rlen,"NM",nm[3]}' > NM_counts.bed

The NM seems to move around in my BAM between 16 and 17th columns, hence some edits To be then merged / grouped-by and summarised

Best Stephane

ADD COMMENT
0
Entering edit mode
3.6 years ago

NM is not always in the $12 , the END position of a read on the REF is not START+read_len (indels, clipping......)

here is a solution using bioalcidaejdk: http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

 java -jar dist/bioalcidaejdk.jar -e 'stream().filter(R->!R.getReadUnmappedFlag() && R.hasAttribute("NM")).forEach(R->println(R.getContig()+"\t"+(R.getStart()-1)+"\t"+R.getEnd()+"\t"+R.getIntegerAttribute("NM")));' in.bam

rotavirus   5   75  7
rotavirus   5   75  3
rotavirus   5   75  2
rotavirus   6   74  6
rotavirus   6   76  6
rotavirus   6   71  5
rotavirus   6   76  7
rotavirus   6   76  6
rotavirus   6   76  6
rotavirus   6   76  7
rotavirus   6   76  5
ADD COMMENT
0
Entering edit mode

when I can compile this I will try (THANKS)

ADD REPLY
0
Entering edit mode
3.6 years ago

meanwhile rediscovered bedtools bamtobed which seems to do precisely what I want (-ed) when I can compile the java above, I will compare. S

ADD COMMENT

Login before adding your answer.

Traffic: 1628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6