Extract deletion positions from a SAM file
1
0
Entering edit mode
4.1 years ago

Hello everyone,

I am interested in detecting deletion events in sequencing data, more specifically PacBio data. I searched for a simple solution to detect and extract position of deletion in a SAM file but I could not find anything despite the apparent simplicity of the problem.

In input I have a SAM file that contains in particular : 1) the position of the first base of each read 2) the CIGAR where M stands for a Match, I stands for an insertion and D stands for a deletion

What I want to do is, in each read of my SAM file, getting the start and end position of the deletion.

Input :

pos    CIGAR
1000   200M200D300M

Output :

deletion_start      deletion_end   
1201                1400

I feel that it can be done with a few command lines in python but I am just learning this language. Once I have my table with the deletion I will be independent since I know R much better but for this step I would need your knowledge guys. If there is a tool that does exactly this and that I missed it is even better!

Thank you very much.

sequencing genome • 1.0k views
ADD COMMENT
0
Entering edit mode
4.1 years ago

using bioalcidaejdk. http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

output is a bed file. (not tested). Basically, the script only loop over the cigar string and print a bed record for each Del

 java -jar dist/bioalcidaejdk.jar -e 'stream().forEach(record->{ if(record.getReadUnmappedFlag()) return; Cigar cigar = record.getCigar(); int ref= record.getStart(); for(final CigarElement ce: cigar) { final CigarOperator op = ce.getOperator(); if(op.equals(CigarOperator.D) || op.equals(CigarOperator.N)) { println(record.getContig()+"\t"+(ref-1)+"\t"+(ref+ce.getLength())+"\t"+record.getReadName()); } if(op.consumesReferenceBases()) ref+= ce.getLength(); } });'  input.bam
ADD COMMENT

Login before adding your answer.

Traffic: 2727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6