filter reads with >90% match
2
0
Entering edit mode
3 months ago

Hello,

I am mapping Oxford nanopore reads to very complex region that has repeats and genome rearrangements. I am using minimap2 for the mapping. I would like to set up a parameter where reads where atleast 90% of a read should be mapped to the reference and if not, the reads should be filtered. I have tried several parameters in minimap2

minimap2 -ax map-ont -N5 --no-long-join --secondary=no -t 20 --MD --sam-hit-only ./input.fasta ../clean_input.fastq.gz | samtools view -bS -@ 20 - | samtools sort -@ 20 - -o output.bam`

I also tried the -r parameter in minimap2 but it didnt seem to work. I saw in a couple of posts that you can do samtools filtering with "NM" however it only takes the number of mismatches and not a percentage.

Can someone help me to filter reads of alignment rate as 90%?

Thanks!

bam long-reads alignment • 567 views
ADD COMMENT
0
Entering edit mode
3 months ago

using samjdk ( https://jvarkit.readthedocs.io/en/latest/SamJdk/ ) , the following script shows, the reads having (NM/read-length) > 0.04. inverse the logic if you want to filter out those reads.

java -jar  dist/jvarkit.jar samjdk -e 'if(record.getReadUnmappedFlag()) return false; if(record.getIntegerAttribute("NM")==null) return false; return (record.getIntegerAttribute("NM").intValue()/(double)record.getReadLength()) > 0.04;'  in.bam
ADD COMMENT
0
Entering edit mode
3 months ago

Thanks for your suggestion. I realised, I need to filter the reads according to the MD tag since there are many parts of the reads that are not matching the reference. Rather the edit distancd (NM) tag. So i opted to use this https://pypi.org/project/filtersam/ tool. It worked well for my needs

ADD COMMENT

Login before adding your answer.

Traffic: 1399 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6