Question: Filtering sam/bam by using CIGAR deletion sites
0
gravatar for ag1194
5 months ago by
ag11940
ag11940 wrote:

Hi,

I have a CLIP-Seq data and after mapping, I want to filter my sam files for reads containing deletion by using CIGAR & MAPQ >10. Is there a way in samtools to subset my sam file containing reads MAPQ>10 & with deletion?

Thank you very much!

next-gen alignment • 442 views
ADD COMMENTlink modified 5 months ago by Geparada1.4k • written 5 months ago by ag11940

Thank you very much!!! I'll try and see samjdk first and also pysam to see the results.

ADD REPLYlink written 5 months ago by ag11940
1
gravatar for Geparada
5 months ago by
Geparada1.4k
Cambridge
Geparada1.4k wrote:

I would do something like:

samtools view file.bam | awk '($6 ~ /D/) && ($5>10)'

($6 ~ /D/) means "deletion in CIGAR" and $5>10 means MAPQ>10

I really like awk oneliners for doing quick filters like this.

ADD COMMENTlink written 5 months ago by Geparada1.4k

With samtools view & awk tools, I cannot keep the header in my output bam. Is there a way to have filtered bam with header with awk tool? Thanks!!!

ADD REPLYlink written 5 months ago by ag11940
1
samtools view -h file.bam | awk '($0 ~ /^@/) || (($6 ~ /D/) && ($5>10))'
ADD REPLYlink modified 5 months ago • written 5 months ago by i.sudbery5.0k
0
gravatar for Pierre Lindenbaum
5 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

using samjdk: http://lindenb.github.io/jvarkit/SamJdk.html

java -jar dist/samjdk.jar -e 'return !record.getReadUnmappedFlag() && record.getMappingQuality()>10 && record.getCigar().getCigarElements().stream().map(C->C.getOperator()).anyMatch(O->O.equals(CigarOperator.D) || O.equals(CigarOperator.N)); ' input.bam
ADD COMMENTlink written 5 months ago by Pierre Lindenbaum121k
0
gravatar for Vitis
5 months ago by
Vitis2.2k
New York
Vitis2.2k wrote:

Pysam is a good tool to manipulate SAM/BAM with various attributes like CIGAR and mapping quality.

https://pysam.readthedocs.io/en/latest/

ADD COMMENTlink written 5 months ago by Vitis2.2k
import pysam
inbam = pysam.AlignmentFile("my_bamfile.bam")
outbam = pysam.AlignmentFile("filtered_bamfile.bam", template=inbam)
for read in inbam.fetch(until_eof=True):
    if read.mapping_quality > 30 and 'D' in read.cigarstring:
        outbam.write(read)
ADD REPLYlink written 5 months ago by i.sudbery5.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 956 users visited in the last hour