filter BAM file for reads that map 100% identity
2
1
Entering edit mode
4.7 years ago

Hello, I have a BAM file and I want to create another BAM file by filtering only reads that are 100% identical mapped. For example if my read length is 100 , I want to select CIGAR 100M and make a bam file. Could you please suggest how this can be done. Thanks Adrian

samtools bam CIGAR • 3.1k views
ADD COMMENT
2
Entering edit mode
4.7 years ago
h.mon 34k

To check for a given length of total match - e.g. 100 - you can use perl and anchor the regular expression: /^100M$/. This will exclude any other total match length (e.g. 95M or 125M) and any soft clipped read. You will have to re-header the file after filtering.

samtools view potexvirus2.bam \
    | perl -lane 'print if $F[5] =~ /^100M$/;'
ADD COMMENT
1
Entering edit mode
4.7 years ago

using samjdk http://lindenb.github.io/jvarkit/SamJdk.html . Check the edit-distance (NM) exists and is equals to zero.

java -jar  dist/samjdk.jar -e 'return !record.getReadUnmappedFlag() && record.getCigarString().equals("100M") &&  (record.getIntegerAttribute("NM")==null || record.getIntegerAttribute("NM").intValue()==0);' in.bam
ADD COMMENT
0
Entering edit mode

Depending on your aligner, NM may or may not include clipped bases so you'll probably want to check the CIGAR as well.

ADD REPLY

Login before adding your answer.

Traffic: 1105 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6