Subset (reduce) bam to contain only variant containing reads
1
0
Entering edit mode
4.4 years ago
SemiQuant ▴ 80

I have targeted sequencing alignments bam files. I want to "pull out" only read pairs in which the reads have a variant, i.e. do not perfectly match the reference. It doesn't matter if these are sequencing errors etc.

Any suggestions, simply using samtools would be great but I could find any bam flags for this, a pysam solution would also be great.

sequencing bam alignment samtools • 854 views
ADD COMMENT
0
Entering edit mode
4.3 years ago
SemiQuant ▴ 80

I don't know why Pierre Lindenbaum answer doesn't show but he proposed this solution.

samtools view -h in.bam | grep -E '^@|NM\:i\:[^0]' > out.sam

NM:i:count = Number of differences (mismatches plus inserted and deleted bases) between the sequence and reference...

If it's missing you can add it using

samtools calmd  "$bam"  "$bam"

An alternate solution is to use bamtools like so.

bamtools filter -tag "NM:>0" -in in.bam -out out.bam

you can use >, >=, <, <=, ! in the expression

ADD COMMENT

Login before adding your answer.

Traffic: 2706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6