I have a CLIP-Seq data and after mapping, I want to filter my sam files for reads containing deletion by using CIGAR & MAPQ >10.
Is there a way in samtools to subset my sam file containing reads MAPQ>10 & with deletion?
Thank you very much!
Thank you very much!!!
I'll try and see samjdk first and also pysam to see the results.
I would do something like:
samtools view file.bam | awk '($6 ~ /D/) && ($5>10)'
($6 ~ /D/) means "deletion in CIGAR" and $5>10 means MAPQ>10
($6 ~ /D/)
I really like awk oneliners for doing quick filters like this.
With samtools view & awk tools, I cannot keep the header in my output bam. Is there a way to have filtered bam with header with awk tool? Thanks!!!
samtools view -h file.bam | awk '($0 ~ /^@/) || (($6 ~ /D/) && ($5>10))'
Hey when I try to pipe this to a new bam like so
samtools view -h file.bam | awk '($0 ~ /^@/) || (($6 ~ /D/) && ($5>10))' > new.bam
I can't view the new.bam. Any ideas?
The result will be sam, not bam. To save it as sam, you need to pipe it to samtools view -b
samtools view -b
using samjdk: http://lindenb.github.io/jvarkit/SamJdk.html
java -jar dist/samjdk.jar -e 'return !record.getReadUnmappedFlag() && record.getMappingQuality()>10 && record.getCigar().getCigarElements().stream().map(C->C.getOperator()).anyMatch(O->O.equals(CigarOperator.D) || O.equals(CigarOperator.N)); ' input.bam
Pysam is a good tool to manipulate SAM/BAM with various attributes like CIGAR and mapping quality.
inbam = pysam.AlignmentFile("my_bamfile.bam")
outbam = pysam.AlignmentFile("filtered_bamfile.bam", template=inbam)
for read in inbam.fetch(until_eof=True):
if read.mapping_quality > 30 and 'D' in read.cigarstring:
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy