Samtools Indels -- Filtering Only Hits With Insertion In The Reference In The Middle Of The Sequence Hit
1
0
Entering edit mode
12.8 years ago

Can anyone suggest how to use samtools to filter only hits where there is a insertion in the reference that splits the sequence hit roughly by the middle?

My sequences are in the range of 100-1000bp and were aligned using "bwa bwasw -z 100".

I don't expect perfect hits, so mismatches and small indels can occur at both ends of the hit, but I am looking for insertions in the reference in the middle that are 10x+ bigger than any small indels at both ends.

I don't have paired ends.

samtools indel • 2.9k views
ADD COMMENT
1
Entering edit mode

So you are assuming that only one read aligns around a given indel? If so, I think you might need to write some code. If, on the other hand, your depth is more than that, it might be useful to call the indels using something like Dindel rather than relying on ad hoc processing.

ADD REPLY
0
Entering edit mode

What kind of coverage are you talking about here? What data processing do you use to produce the alignments?

ADD REPLY
0
Entering edit mode

@Sean Davis: added comment - My sequences are in the range of 100-1000bp and were aligned using "bwa bwasw -z 100".

ADD REPLY
1
Entering edit mode
12.8 years ago

I think your best bet is to parse out the CIGAR string and make a decision on that. The definition of 'roughly by the middle' is fuzzy enough to make it unlikely that such functionality would be implemented by default, a possible python code:

import re

patt  = re.compile( '\d+M|\d+D|\d+I' )
cigar = "1I12M1D12M"
vals  =  patt.findall( cigar )

print vals

# decide here how roughly the middle is defined

prints:

['1I', '12M', '1D', '12M']
ADD COMMENT
0
Entering edit mode

I changed the description a bit, I want to allow d+Ds at both ends, but the one in the middle had got to be 10x+ bigger than the ones at both ends.

ADD REPLY

Login before adding your answer.

Traffic: 1942 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6