Question: Filtering sam/bam by using CIGAR deletion sites
0
gravatar for ag1194
10 months ago by
ag11940
ag11940 wrote:

Hi,

I have a CLIP-Seq data and after mapping, I want to filter my sam files for reads containing deletion by using CIGAR & MAPQ >10. Is there a way in samtools to subset my sam file containing reads MAPQ>10 & with deletion?

Thank you very much!

next-gen alignment • 661 views
ADD COMMENTlink modified 10 months ago by Geparada1.4k • written 10 months ago by ag11940

Thank you very much!!! I'll try and see samjdk first and also pysam to see the results.

ADD REPLYlink written 10 months ago by ag11940
1
gravatar for Geparada
10 months ago by
Geparada1.4k
Cambridge
Geparada1.4k wrote:

I would do something like:

samtools view file.bam | awk '($6 ~ /D/) && ($5>10)'

($6 ~ /D/) means "deletion in CIGAR" and $5>10 means MAPQ>10

I really like awk oneliners for doing quick filters like this.

ADD COMMENTlink written 10 months ago by Geparada1.4k

With samtools view & awk tools, I cannot keep the header in my output bam. Is there a way to have filtered bam with header with awk tool? Thanks!!!

ADD REPLYlink written 10 months ago by ag11940
1
samtools view -h file.bam | awk '($0 ~ /^@/) || (($6 ~ /D/) && ($5>10))'
ADD REPLYlink modified 10 months ago • written 10 months ago by i.sudbery6.3k
0
gravatar for Pierre Lindenbaum
10 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

using samjdk: http://lindenb.github.io/jvarkit/SamJdk.html

java -jar dist/samjdk.jar -e 'return !record.getReadUnmappedFlag() && record.getMappingQuality()>10 && record.getCigar().getCigarElements().stream().map(C->C.getOperator()).anyMatch(O->O.equals(CigarOperator.D) || O.equals(CigarOperator.N)); ' input.bam
ADD COMMENTlink written 10 months ago by Pierre Lindenbaum124k
0
gravatar for Vitis
10 months ago by
Vitis2.3k
New York
Vitis2.3k wrote:

Pysam is a good tool to manipulate SAM/BAM with various attributes like CIGAR and mapping quality.

https://pysam.readthedocs.io/en/latest/

ADD COMMENTlink written 10 months ago by Vitis2.3k
import pysam
inbam = pysam.AlignmentFile("my_bamfile.bam")
outbam = pysam.AlignmentFile("filtered_bamfile.bam", template=inbam)
for read in inbam.fetch(until_eof=True):
    if read.mapping_quality > 30 and 'D' in read.cigarstring:
        outbam.write(read)
ADD REPLYlink written 10 months ago by i.sudbery6.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2555 users visited in the last hour