Hi, all. I am using samtools view .bam [region]
to extract all reads overlapping the region. I find some reads with INDEL and I want to filter them, only getting the reads with no INDEL. Which samtools options I should use?
Question: How to extract reads with no INDEL?
1
Jeason Rad • 30 wrote:
ADD COMMENT
• link
•
modified 3.7 years ago
by
dariober ♦ 11k
•
written
3.7 years ago by
Jeason Rad • 30
6
dariober ♦ 11k wrote:
Isn't it enough to keep reads without I or D operator in the cigar? For this you can use awk like:
samtools view -h in.bam chr1:12340-200000 \
| awk '$1 ~ "^@" || $6 !~ "I|D"' \
| samtools view -b - > out.bam
Hi all. Any way to modify this so it finds read with deletions > 20bp?
samtools view -h in.bam chr1:12340-200000 \
| awk '$1 ~ "^@" || $6 ! "20D"' \
| samtools view -b - > out.bam
Would give reads with exactly 20 bp, right? Is it possible to do > 20bp?
@shimbalama - If you really wanted to you could write an awk script that parses the cigar string and checks for a D operator > 20. But I think it would be easier to write a python script using the pysam package to parse the cigar string or see if Pierre's tools can do it (probably they can). If you get stuck, maybe open a new question for it.
3
Pierre Lindenbaum ♦ 133k wrote:
using samjs http://lindenb.github.io/jvarkit/SamJavascript.html
samtools -bu view your.bam chr1:12340-200000 |\
java -jar dist/samjs.jar -e 'function accept(r) { if(r.getReadUnmappedFlag()) return false; var cigar=r.getCigar();if(cigar==null) return false; for(var i=0;i< cigar.numCigarElements();++i) {if(cigar.getCigarElement(i).getOperator().isIndelOrSkippedRegion()) return false; } return true;} accept(record);'
Please log in to add an answer.
Use of this site constitutes acceptance of our User
Agreement
and Privacy
Policy.
Powered by Biostar
version 2.3.0
Traffic: 2237 users visited in the last hour