Question: Filtering sam/bam by using CIGAR deletion sites
0
gravatar for ag1194
4 weeks ago by
ag11940
ag11940 wrote:

Hi,

I have a CLIP-Seq data and after mapping, I want to filter my sam files for reads containing deletion by using CIGAR & MAPQ >10. Is there a way in samtools to subset my sam file containing reads MAPQ>10 & with deletion?

Thank you very much!

next-gen alignment • 209 views
ADD COMMENTlink modified 4 weeks ago by Geparada1.4k • written 4 weeks ago by ag11940

Thank you very much!!! I'll try and see samjdk first and also pysam to see the results.

ADD REPLYlink written 4 weeks ago by ag11940
1
gravatar for Geparada
4 weeks ago by
Geparada1.4k
Cambridge
Geparada1.4k wrote:

I would do something like:

samtools view file.bam | awk '($6 ~ /D/) && ($5>10)'

($6 ~ /D/) means "deletion in CIGAR" and $5>10 means MAPQ>10

I really like awk oneliners for doing quick filters like this.

ADD COMMENTlink written 4 weeks ago by Geparada1.4k

With samtools view & awk tools, I cannot keep the header in my output bam. Is there a way to have filtered bam with header with awk tool? Thanks!!!

ADD REPLYlink written 4 weeks ago by ag11940
1
samtools view -h file.bam | awk '($0 ~ /^@/) || (($6 ~ /D/) && ($5>10))'
ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by i.sudbery3.8k
0
gravatar for Pierre Lindenbaum
4 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum117k wrote:

using samjdk: http://lindenb.github.io/jvarkit/SamJdk.html

java -jar dist/samjdk.jar -e 'return !record.getReadUnmappedFlag() && record.getMappingQuality()>10 && record.getCigar().getCigarElements().stream().map(C->C.getOperator()).anyMatch(O->O.equals(CigarOperator.D) || O.equals(CigarOperator.N)); ' input.bam
ADD COMMENTlink written 4 weeks ago by Pierre Lindenbaum117k
0
gravatar for Vitis
4 weeks ago by
Vitis1.9k
New York
Vitis1.9k wrote:

Pysam is a good tool to manipulate SAM/BAM with various attributes like CIGAR and mapping quality.

https://pysam.readthedocs.io/en/latest/

ADD COMMENTlink written 4 weeks ago by Vitis1.9k
import pysam
inbam = pysam.AlignmentFile("my_bamfile.bam")
outbam = pysam.AlignmentFile("filtered_bamfile.bam", template=inbam)
for read in inbam.fetch(until_eof=True):
    if read.mapping_quality > 30 and 'D' in read.cigarstring:
        outbam.write(read)
ADD REPLYlink written 4 weeks ago by i.sudbery3.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 724 users visited in the last hour