Split large deletion D in CIGAR string and filtering reads
1
0
Entering edit mode
3 months ago
marco.barr ▴ 90

I am starting to work on long-read mtDNA sequencing and I am facing an issue. Some of the reads I aligned with minimap2 show a large deletion of about 1000 bp on IGV. My goal is to filter out reads with D > 1000 on the CIGAR string. However, when using samtools view commands, I cannot see any CIGAR string with D1000 or similar, which is also confirmed by sequences where there are no deletions. How is this possible? Perhaps minimap2 splits large deletions into smaller ones for computational reasons? Or are there alignment issues? Below are the commands. Thank you for the help and explanations you will provide.

samtools view mt8892L_merged_mapped_chrM_sorted.bam | cut -f 6 | grep -P '[D>1000]' | head
CIGAR reads • 664 views
ADD COMMENT
2
Entering edit mode
3 months ago

using samjdk https://jvarkit.readthedocs.io/en/latest/SamJdk/

large deletion could be 'N' too.

java -jar jvarkit.jar samjdk -e 'return record.getReadUnmappedFlag() || record.getCigar().getCigarElements().stream().noneMatch(CE->(CE.getOperator()==CigarOperator.D || CE.getOperator()==CigarOperator.N)  && CE.getLength()>1000);' in.bam

grep -P '[D>1000]'

I don't know those perl expressions. How does it work ?

$ echo "D2" | grep -P '[D> 10]'
D2
ADD COMMENT
0
Entering edit mode

Thank you very much for your advice. So, according to the samjkd syntax, am I extracting reads that have a total length of D greater than 1000? With that Perl command, I was trying to adapt my request from this: samtools view -F 0x4 input.sort.bam | cut -f 6 | grep -P '[ID]' | head. Perhaps I am mistaken

ADD REPLY
0
Entering edit mode

So, according to the samjkd syntax, am I extracting reads that have a total length of D greater than 1000?

no, you're excluding. See noneMatch

ADD REPLY
0
Entering edit mode

ok thanks, I just wanted to extract instead, is this command ok?

java -jar jvarkit.jar samjdk -e 'return record.getReadUnmappedFlag() || record.getCigar().getCigarElements().stream().anyMatch(CE->(CE.getOperator()==CigarOperator.D) && CE.getLength()>1000);' in.bam
ADD REPLY
1
Entering edit mode

no, it would be:

java -jar jvarkit.jar samjdk -e 'return record.getReadUnmappedFlag() || record.getCigar().getCigarElements().stream().anyMatch(CE->(CE.getOperator()==CigarOperator.D || CE.getOperator()==CigarOperator.N)  && CE.getLength()>1000);' in.bam
ADD REPLY
0
Entering edit mode

thank you so much!

ADD REPLY
0
Entering edit mode

Please accept the answer so the question is marked solved on the website. To do that, click on the green check mark on the left side of the answer.

ADD REPLY

Login before adding your answer.

Traffic: 2281 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6