Question: Subset (reduce) bam to contain only variant containing reads
0
gravatar for SemiQuant
10 months ago by
SemiQuant70
South Africa
SemiQuant70 wrote:

I have targeted sequencing alignments bam files. I want to "pull out" only read pairs in which the reads have a variant, i.e. do not perfectly match the reference. It doesn't matter if these are sequencing errors etc.

Any suggestions, simply using samtools would be great but I could find any bam flags for this, a pysam solution would also be great.

ADD COMMENTlink modified 10 months ago • written 10 months ago by SemiQuant70
0
gravatar for SemiQuant
10 months ago by
SemiQuant70
South Africa
SemiQuant70 wrote:

I don't know why Pierre Lindenbaum answer doesn't show but he proposed this solution.

samtools view -h in.bam | grep -E '^@|NM\:i\:[^0]' > out.sam

NM:i:count = Number of differences (mismatches plus inserted and deleted bases) between the sequence and reference...

If it's missing you can add it using

samtools calmd  "$bam"  "$bam"

An alternate solution is to use bamtools like so.

bamtools filter -tag "NM:>0" -in in.bam -out out.bam

you can use >, >=, <, <=, ! in the expression

ADD COMMENTlink written 10 months ago by SemiQuant70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1159 users visited in the last hour